Paul Thomas [Sat, 26 Aug 2023 13:37:49 +0000 (14:37 +0100)]
Fortran: Supply a missing dereference [PR92586]
2023-08-26 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/92586
* trans-expr.cc (gfc_trans_arrayfunc_assign): Supply a missing
dereference for the call to gfc_deallocate_alloc_comp_no_caf.
gcc/testsuite/
PR fortran/92586
* gfortran.dg/pr92586.f90 : New test
There is a redundant vsetvli instruction in VLA vectorized codes which is the VSETVL PASS issue.
vsetvl issue is not included in this patch but will be fixed soon.
gcc/ChangeLog:
* config/riscv/autovec.md (len_fold_extract_last_<mode>): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_fold_extract_last): New function.
* config/riscv/riscv-v.cc (emit_nonvlmax_slide_insn): Ditto.
(emit_cpop_insn): Ditto.
(emit_nonvlmax_compress_insn): Ditto.
(expand_fold_extract_last): Ditto.
* config/riscv/vector.md: Fix vcpop.m ratio demand.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/reduc/extract_last-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-9.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c: New test.
Andrew Pinski [Sat, 26 Aug 2023 02:10:52 +0000 (19:10 -0700)]
Fix phi-opt-34.c testcase
Somehow when I was testing the new testcase, it was working but
when I re-ran the full testsuite it was not. Anyways the issue
was just a simple space before the `}` for dg-options directive.
Edwin Lu [Fri, 25 Aug 2023 23:35:43 +0000 (16:35 -0700)]
RISC-V: Add Types to Un-Typed Sync Instructions:
Updates the sync instructions to ensure that no insn is left without
a type attribute. Updates a total of 9 insns to have type "atomic"
or type "multi" based on number of assembly instructions generated
Tested for regressions using rv32/64 multilib with newlib/linux.
gcc/Changelog:
* config/riscv/sync-rvwmo.md: updated types to "multi" or
"atomic" based on number of assembly lines generated
* config/riscv/sync-ztso.md: likewise
* config/riscv/sync.md: likewise
Reviewed-by: Jeff Law <jlaw@ventanamicro.com> Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
Jeff Law [Fri, 25 Aug 2023 22:34:17 +0000 (16:34 -0600)]
RISC-V: Make stack_save_restore tests more robust
Spurred by Jivan's patch and a desire for cleaner testresults, I went ahead and
make the stack_save_restore tests independent of the precise stack size by
using a regexp.
Jin Ma [Fri, 25 Aug 2023 21:34:40 +0000 (15:34 -0600)]
[PATCH v10] RISC-V: Add support for the Zfa extension
This patch adds the 'Zfa' extension for riscv, which is based on:
https://github.com/riscv/riscv-isa-manual/commits/zfb
The binutils-gdb for 'Zfa' extension:
https://sourceware.org/pipermail/binutils/2023-April/127060.html
What needs special explanation is:
1, According to riscv-spec, "The FCVTMO D.W.D instruction was added principally to
accelerate the processing of JavaScript Numbers.", so it seems that no implementation
is required.
2, The instructions FMINM and FMAXM correspond to C23 library function fminimum and fmaximum.
Therefore, this patch has simply implemented the pattern of fminm<hf\sf\df>3 and
fmaxm<hf\sf\df>3 to prepare for later.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add zfa extension version, which depends on
the F extension.
* config/riscv/constraints.md (zfli): Constrain the floating point number that the
instructions FLI.H/S/D can load.
* config/riscv/iterators.md (ceil): New.
* config/riscv/riscv-opts.h (MASK_ZFA): New.
(TARGET_ZFA): New.
* config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): New.
* config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli): New.
(riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, memory is
not applicable.
(riscv_const_insns): Likewise.
(riscv_legitimize_const_move): Likewise.
(riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no split is
required.
(riscv_split_doubleword_move): Likewise.
(riscv_output_move): Output the mov instructions in zfa extension.
(riscv_print_operand): Output the floating-point value of the FLI.H/S/D immediate
in assembly.
(riscv_secondary_memory_needed): Likewise.
* config/riscv/riscv.md (fminm<mode>3): New.
(fmaxm<mode>3): New.
(movsidf2_low_rv32): New.
(movsidf2_high_rv32): New.
(movdfsisi3_rv32): New.
(f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_zfa): New.
* config/riscv/riscv.opt: New.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zfa-fleq-fltq.c: New test.
* gcc.target/riscv/zfa-fli-zfh.c: New test.
* gcc.target/riscv/zfa-fli.c: New test.
* gcc.target/riscv/zfa-fmovh-fmovp.c: New test.
* gcc.target/riscv/zfa-fli-1.c: New test.
* gcc.target/riscv/zfa-fli-2.c: New test.
* gcc.target/riscv/zfa-fli-3.c: New test.
* gcc.target/riscv/zfa-fli-4.c: New test.
* gcc.target/riscv/zfa-fli-6.c: New test.
* gcc.target/riscv/zfa-fli-7.c: New test.
* gcc.target/riscv/zfa-fli-8.c: New test.
Co-authored-by: Tsukasa OI <research_trasio@irq.a4lg.com>
Sandra Loosemore [Thu, 24 Aug 2023 17:35:01 +0000 (17:35 +0000)]
OpenMP: Fortran support for imperfectly-nested loops
OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop. In GCC this code is moved
into the inner loop body by the respective front ends.
In the Fortran front end, most of the semantic processing happens during
the translation phase, so the parse phase just collects the intervening
statements, checks them for errors, and splices them around the loop body.
gcc/fortran/ChangeLog
* gfortran.h (struct gfc_namespace): Add omp_structured_block bit.
* openmp.cc: Include omp-api.h.
(resolve_omp_clauses): Consolidate inscan reduction clause conflict
checking here.
(find_nested_loop_in_chain): New.
(find_nested_loop_in_block): New.
(gfc_resolve_omp_do_blocks): Set omp_current_do_collapse properly.
Handle imperfectly-nested loops when looking for nested omp scan.
Refactor to move inscan reduction clause conflict checking to
resolve_omp_clauses.
(gfc_resolve_do_iterator): Handle imperfectly-nested loops.
(struct icode_error_state): New.
(icode_code_error_callback): New.
(icode_expr_error_callback): New.
(diagnose_intervening_code_errors_1): New.
(diagnose_intervening_code_errors): New.
(make_structured_block): New.
(restructure_intervening_code): New.
(is_outer_iteration_variable): Do not assume loops are perfectly
nested.
(check_nested_loop_in_chain): New.
(check_nested_loop_in_block_state): New.
(check_nested_loop_in_block_symbol): New.
(check_nested_loop_in_block): New.
(expr_uses_intervening_var): New.
(is_intervening_var): New.
(expr_is_invariant): Do not assume loops are perfectly nested.
(resolve_omp_do): Handle imperfectly-nested loops.
* trans-stmt.cc (gfc_trans_block_construct): Generate
OMP_STRUCTURED_BLOCK if magic bit is set on block namespace.
Sandra Loosemore [Thu, 24 Aug 2023 17:35:00 +0000 (17:35 +0000)]
OpenMP: C++ support for imperfectly-nested loops
OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop. In GCC this code is moved
into the inner loop body by the respective front ends.
This patch changes the C++ front end to use recursive descent parsing
on nested loops within an "omp for" construct, rather than an
iterative approach, in order to preserve proper nesting of compound
statements. Preserving cleanups (destructors) for class objects
declared in intervening code and loop initializers complicates moving
the former into the body of the loop; this is handled by parsing the
entire construct before reassembling any of it.
gcc/cp/ChangeLog
* cp-tree.h (cp_convert_omp_range_for): Adjust declaration.
* parser.cc (struct omp_for_parse_data): New.
(cp_parser_postfix_expression): Diagnose calls to OpenMP runtime
in intervening code.
(check_omp_intervening_code): New.
(cp_parser_statement_seq_opt): Special-case nested loops, blocks,
and other constructs for OpenMP loops.
(cp_parser_iteration_statement): Reject loops in intervening code.
(cp_parser_omp_for_loop_init): Expand comments and tweak the
interface slightly to better distinguish input/output parameters.
(cp_convert_omp_range_for): Likewise.
(cp_parser_omp_loop_nest): New, split from cp_parser_omp_for_loop
and largely rewritten. Add more comments.
(insert_structured_blocks): New.
(find_structured_blocks): New.
(struct sit_data, substitute_in_tree_walker, substitute_in_tree):
New.
(fixup_blocks_walker): New.
(cp_parser_omp_for_loop): Rewrite to use recursive descent instead
of a loop. Add logic to reshuffle the bits of code collected
during parsing so intervening code gets moved to the loop body.
(cp_parser_omp_loop): Remove call to finish_omp_for_block, which
is now redundant.
(cp_parser_omp_simd): Likewise.
(cp_parser_omp_for): Likewise.
(cp_parser_omp_distribute): Likewise.
(cp_parser_oacc_loop): Likewise.
(cp_parser_omp_taskloop): Likewise.
(cp_parser_pragma): Reject OpenMP pragmas in intervening code.
* parser.h (struct cp_parser): Add omp_for_parse_state field.
* pt.cc (tsubst_omp_for_iterator): Adjust call to
cp_convert_omp_range_for.
* semantics.cc (finish_omp_for): Try harder to preserve location
of loop variable init expression for use in diagnostics.
(struct fofb_data, finish_omp_for_block_walker): New.
(finish_omp_for_block): Allow variables to be bound in a BIND_EXPR
nested inside BIND instead of directly in BIND itself.
gcc/testsuite/ChangeLog
* c-c++-common/goacc/tile-2.c: Adjust expected error patterns.
* g++.dg/gomp/attrs-imperfect1.C: New test.
* g++.dg/gomp/attrs-imperfect2.C: New test.
* g++.dg/gomp/attrs-imperfect3.C: New test.
* g++.dg/gomp/attrs-imperfect4.C: New test.
* g++.dg/gomp/attrs-imperfect5.C: New test.
* g++.dg/gomp/pr41967.C: Adjust expected error patterns.
* g++.dg/gomp/tpl-imperfect-gotos.C: New test.
* g++.dg/gomp/tpl-imperfect-invalid-scope.C: New test.
libgomp/ChangeLog
* testsuite/libgomp.c++/attrs-imperfect1.C: New test.
* testsuite/libgomp.c++/attrs-imperfect2.C: New test.
* testsuite/libgomp.c++/attrs-imperfect3.C: New test.
* testsuite/libgomp.c++/attrs-imperfect4.C: New test.
* testsuite/libgomp.c++/attrs-imperfect5.C: New test.
* testsuite/libgomp.c++/attrs-imperfect6.C: New test.
* testsuite/libgomp.c++/imperfect-class-1.C: New test.
* testsuite/libgomp.c++/imperfect-class-2.C: New test.
* testsuite/libgomp.c++/imperfect-class-3.C: New test.
* testsuite/libgomp.c++/imperfect-destructor.C: New test.
* testsuite/libgomp.c++/imperfect-template-1.C: New test.
* testsuite/libgomp.c++/imperfect-template-2.C: New test.
* testsuite/libgomp.c++/imperfect-template-3.C: New test.
Sandra Loosemore [Thu, 24 Aug 2023 17:35:00 +0000 (17:35 +0000)]
OpenMP: C front end support for imperfectly-nested loops
OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop. In GCC this code is moved
into the inner loop body by the respective front ends.
This patch changes the C front end to use recursive descent parsing
on nested loops within an "omp for" construct, rather than an iterative
approach, in order to preserve proper nesting of compound statements.
New common C/C++ testcases are in a separate patch.
gcc/c/ChangeLog
* c-parser.cc (struct c_parser): Add omp_for_parse_state field.
(struct omp_for_parse_data): New.
(check_omp_intervening_code): New.
(add_structured_block_stmt): New.
(c_parser_compound_statement_nostart): Recognize intervening code,
nested loops, and other things that need special handling in
OpenMP loop constructs.
(c_parser_while_statement): Error on loop in intervening code.
(c_parser_do_statement): Likewise.
(c_parser_for_statement): Likewise.
(c_parser_postfix_expression_after_primary): Error on calls to
the OpenMP runtime in intervening code.
(c_parser_pragma): Error on OpenMP pragmas in intervening code.
(c_parser_omp_loop_nest): New.
(c_parser_omp_for_loop): Rewrite to use recursive descent, calling
c_parser_omp_loop_nest to do the heavy lifting.
gcc/ChangeLog
* omp-api.h: New.
* omp-general.cc (omp_runtime_api_procname): New.
(omp_runtime_api_call): Moved here from omp-low.cc, and make
non-static.
* omp-general.h: Include omp-api.h.
* omp-low.cc (omp_runtime_api_call): Delete this copy.
gcc/testsuite/ChangeLog
* c-c++-common/goacc/collapse-1.c: Update for new C error behavior.
* c-c++-common/goacc/tile-2.c: Likewise.
* gcc.dg/gomp/collapse-1.c: Likewise.
Sandra Loosemore [Thu, 24 Aug 2023 17:34:59 +0000 (17:34 +0000)]
OpenMP: Add OMP_STRUCTURED_BLOCK and GIMPLE_OMP_STRUCTURED_BLOCK.
In order to detect invalid jumps in and out of intervening code in
imperfectly-nested loops, the front ends need to insert some sort of
marker to identify the structured block sequences that they push into
the inner body of the loop. The error checking happens in the
diagnose_omp_blocks pass, between gimplification and OMP lowering, so
we need both GENERIC and GIMPLE representations of these markers.
They are removed in OMP lowering so no subsequent passes need to know
about them.
This patch doesn't include any front-end changes to generate the new
data structures.
Vineet Gupta [Mon, 7 Aug 2023 20:45:29 +0000 (13:45 -0700)]
RISC-V: Enable Hoist to GCSE simple constants
Hoist want_to_gcse_p () calls rtx_cost () to compute max distance for
hoist candidates. For a simple const (say 6 which needs seperate insn "LI 6")
backend currently returns 0, causing Hoist to bail and elide GCSE.
Note that constants requiring more than 1 insns to setup were working
fine since riscv_rtx_costs () was returning non-zero (although that
itself might need refining: see bugzilla 111139).
To keep testsuite parity, some V tests need updating which started failing
in the new costing regime.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_rtx_costs): Adjust const_int
cost. Add some comments about different constants handling.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/gcse-const.c: New Test
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Remove test
for Jump.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c: Ditto.
In PR 106677, I noticed that on the trunk we were producing:
```
_25 = SR.116_117 == 0;
_27 = (unsigned char) _25;
_32 = _27 | SR.116_117;
```
From `SR.115_117 != 0 ? SR.115_117 : 1`
Rather than:
```
_119 = MAX_EXPR <1, SR.115_117>;
```
Or (rather)
```
_119 = SR.115_117 | 1;
```
Due to the order of the patterns.
Committed as approved with the new comment and testcase.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Andrew Pinski [Sun, 20 Aug 2023 00:56:46 +0000 (17:56 -0700)]
MATCH: `a | C -> C` when we know that `a & ~C == 0`
Even though this is handled by other code inside both VRP and CCP,
sometimes we want to optimize this outside of VRP and CCP.
An example is given in PR 106677 where phiopt will happen
after VRP (which removes a cast for a comparison) and then
phiopt will optimize the phi to be `a | 1` which can then
be optimized to `1` due to this patch.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Note Similar code already exists in simplify_rtx for the RTL level;
it was moved from combine to simplify_rtx in r0-72539-gbd1ef757767f6d.
gcc/ChangeLog:
Harald Anlauf [Thu, 24 Aug 2023 21:16:25 +0000 (23:16 +0200)]
Fortran: improve bounds checking for DATA with implied-do [PR35095]
gcc/fortran/ChangeLog:
PR fortran/35095
* data.cc (get_array_index): Add bounds-checking code and return error
status. Overindexing will be allowed as an extension for -std=legacy
and generate an error in standard-conforming mode.
(gfc_assign_data_value): Use error status from get_array_index for
graceful error recovery.
gcc/testsuite/ChangeLog:
PR fortran/35095
* gfortran.dg/data_bounds_1.f90: Adjust options to disable warnings.
* gfortran.dg/data_bounds_2.f90: New test.
David Malcolm [Fri, 25 Aug 2023 12:41:19 +0000 (08:41 -0400)]
analyzer: fix ICE in text art strings support
gcc/analyzer/ChangeLog:
* access-diagram.cc (class string_region_spatial_item): Remove
assumption that the string is written to the start of the cluster.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/out-of-bounds-diagram-17.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-18.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-19.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Richard Biener [Fri, 25 Aug 2023 11:37:30 +0000 (13:37 +0200)]
tree-optimization/111137 - dependence checking for SLP
The following fixes a mistake with SLP dependence checking. When
checking whether we can hoist loads to the first load place we
special-case stores of the same instance considering them sunk
to the last store place. But we fail to consider that stores from
other SLP instances are sunk in a similar way. This leads us to
miss the dependence between (A) and (B) in
where the zeroing stores are sunk to (A') and the loads hoisted
to (B'). The following fixes this, treating grouped stores from
other instances similar to stores from our own instance. The
difference is - and this is more conservative than necessary - that
we don't know which stores of a group are in which SLP instance
(though I believe either all of the grouped stores will be in
a single SLP instance or in none at the moment), so we don't
know which stores are sunk where. We simply assume they are
all sunk to the last store we run into. Likewise we do not take
into account that an SLP instance might be cancelled (or a grouped
store not actually belong to any instance).
PR tree-optimization/111137
* tree-vect-data-refs.cc (vect_slp_analyze_load_dependences):
Properly handle grouped stores from other SLP instances.
Richard Biener [Fri, 25 Aug 2023 09:43:36 +0000 (11:43 +0200)]
Apply some TLC to vect_slp_analyze_instance_dependence
This refactors things, separating load and store handing, adjusting
comments to reflect reality and removing some dead code.
* tree-vect-data-refs.cc (vect_slp_analyze_store_dependences):
Split out from vect_slp_analyze_node_dependences, remove
dead code.
(vect_slp_analyze_load_dependences): Split out from
vect_slp_analyze_node_dependences, adjust comments. Process
queued stores before any disambiguation.
(vect_slp_analyze_node_dependences): Remove.
(vect_slp_analyze_instance_dependence): Adjust.
Aldy Hernandez [Wed, 23 Aug 2023 10:23:49 +0000 (12:23 +0200)]
[frange] Relax floating point relational folding.
This patch implements a new frelop_early_resolve() that handles the
NAN special cases instead of calling into the integer version which
can break for some combinations. Relaxing FP conditional folding in
this matter allows ranger to do a better job resulting in more
threading opportunities, among other things.
In auditing ranger versus DOM scoped tables I've noticed we are too
cautious when folding floating point conditionals involving
relationals. We refuse to fold anything if there is the possibility
of a NAN, but this is overly restrictive.
For example:
if (x_5 != y_8)
if (x_5 != y_8)
link_error ();
In range-ops, we fail to fold the second conditional because
frelop_early_resolve bails on anything that may have a NAN, but in the
above case the possibility of a NAN is inconsequential.
However, there are some cases where we must be careful, because a NAN
can complicate matters:
if (x_5 == x_5)
...
Here the operands to EQ_EXPR are the same so we get VREL_EQ as the
relation. However, we can't fold the conditional unless we know x_5
cannot be a NAN.
On the other hand, we can fold the second conditional here:
if (x_5 == x_5)
if (x_5 > x_5)
Because on the TRUE side of the first conditional we are guaranteed to
be free of NANs.
This patch is basically an inline of the integer version of
relop_early_resolve() with special casing for floats.
The main thing to keep in mind is that the relation coming into a
range-op entry may have a NAN, and for that one must look at the
operands. This makes the relations akin to unordered comparisons,
making VREL_LT behave like VREL_UNLT would.
The tricky corner cases are VREL_EQ and VREL_NE, as discussed above.
Apart from these that are special cased, the relation table for
intersect should work fine for returning a FALSE, even with NANs. The
union table, not so much and is documented in the code.
This allows us to add some optimizations for the unordered operators.
For example, a relation of VREL_LT on entry to an operator allows us
to fold an UNLT_EXPR as true, even with NANs because in this case
VREL_LT is really VREL_UNLT which maps perfectly.
BTW, we batted some ideas on how to get this work, and it seems this
is the cleaner route with the special cases nestled in the operators
themselves. Another idea is to add unordered relations, but that
would require bloating the various tables adding spots for VREL_UNEQ,
VREL_UNLT, etc, plus adding relations for VREL_UNORDERED so the
intersects work correctly. I'm not wed to either one, and we can
certainly revisit this if it becomes burdensome to maintain (or to get
right).
gcc/ChangeLog:
* range-op-float.cc (frelop_early_resolve): Rewrite for better NAN
handling.
(operator_not_equal::fold_range): Adjust for relations.
(operator_lt::fold_range): Same.
(operator_gt::fold_range): Same.
(foperator_unordered_equal::fold_range): Same.
(foperator_unordered_lt::fold_range): Same.
(foperator_unordered_le::fold_range): Same.
(foperator_unordered_gt::fold_range): Same.
(foperator_unordered_ge::fold_range): Same.
Richard Biener [Fri, 25 Aug 2023 07:42:16 +0000 (09:42 +0200)]
tree-optimization/111136 - STMT_VINFO_SLP_VECT_ONLY and stores
vect_dissolve_slp_only_groups currently only expects loads, for stores
we have to make sure to mark the dissolved "groups" strided.
PR tree-optimization/111136
* tree-vect-loop.cc (vect_dissolve_slp_only_groups): For
stores force STMT_VINFO_STRIDED_P and also duplicate that
to all elements.
Patrick O'Neill [Thu, 24 Aug 2023 16:56:01 +0000 (09:56 -0700)]
RISC-V: Move vector-abi testcases into rvv/base folder
Resolves failures like this on rv32gcv linux:
compiler exited with status 1
output is:
In file included from /tc-baseline/build-linux-gcv/sysroot/usr/include/features.h:515,
from /tc-baseline/build-linux-gcv/sysroot/usr/include/bits/libc-header-start.h:33,
from /tc-baseline/build-linux-gcv/sysroot/usr/include/stdint.h:26,
from /tc-baseline/build-linux-gcv/lib/gcc/riscv32-unknown-linux-gnu/14.0.0/include/stdint.h:9,
from /tc-baseline/build-linux-gcv/build-gcc-linux-stage2/gcc/include/stdint.h:9,
from /tc-baseline/build-linux-gcv/build-gcc-linux-stage2/gcc/include/riscv_vector.h:28,
from /tc-baseline/gcc/gcc/testsuite/gcc.target/riscv/vector-abi-1.c:4:
/tc-baseline/build-linux-gcv/sysroot/usr/include/gnu/stubs.h:17:11: fatal error: gnu/stubs-lp64d.h: No such file or directory
compilation terminated.
Tested using:
rv{32/64}{gc/gcv} newlib
rv{32/64}gcv linux
gcc/testsuite/ChangeLog:
* gcc.target/riscv/vector-abi-1.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-1.c: ...here.
* gcc.target/riscv/vector-abi-2.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-2.c: ...here.
* gcc.target/riscv/vector-abi-3.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-3.c: ...here.
* gcc.target/riscv/vector-abi-4.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-4.c: ...here.
* gcc.target/riscv/vector-abi-5.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-5.c: ...here.
* gcc.target/riscv/vector-abi-6.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-6.c: ...here.
* gcc.target/riscv/vector-abi-7.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-7.c: ...here.
* gcc.target/riscv/vector-abi-8.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-8.c: ...here.
* gcc.target/riscv/vector-abi-9.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-9.c: ...here.
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Jose E. Marchesi [Thu, 24 Aug 2023 15:10:52 +0000 (17:10 +0200)]
Fix tests for PR 106537.
This patch fixes the tests for PR 106537 (support for
-W[no]-compare-distinct-pointer-types) which were expecting the
warning when checking for equality/inequality of void pointers with
non-function pointers.
gcc/testsuite/ChangeLog:
PR c/106537
* gcc.c-torture/compile/pr106537-1.c: Comparing void pointers to
non-function pointers is legit.
* gcc.c-torture/compile/pr106537-2.c: Likewise.
David Malcolm [Thu, 24 Aug 2023 14:24:40 +0000 (10:24 -0400)]
analyzer: implement kf_strcat [PR105899]
gcc/analyzer/ChangeLog:
PR analyzer/105899
* call-details.cc
(call_details::check_for_null_terminated_string_arg): Split into
overloads, one taking just an arg_idx, the other a new
"include_terminator" param.
* call-details.h: Likewise.
* kf.cc (class kf_strcat): New.
(kf_strcpy::impl_call_pre): Update for change to
check_for_null_terminated_string_arg.
(register_known_functions): Register kf_strcat.
* region-model.cc
(region_model::check_for_null_terminated_string_arg): Split into
overloads, one taking just an arg_idx, the other a new
"include_terminator" param. When returning an svalue, handle
"include_terminator" being false by subtracting one.
* region-model.h
(region_model::check_for_null_terminated_string_arg): Split into
overloads, one taking just an arg_idx, the other a new
"include_terminator" param.
gcc/ChangeLog:
PR analyzer/105899
* doc/invoke.texi (Static Analyzer Options): Add "strcat" to the
list of functions known to the analyzer.
gcc/testsuite/ChangeLog:
PR analyzer/105899
* gcc.dg/analyzer/strcat-1.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
gcc/analyzer/ChangeLog:
PR analyzer/105899
* region-model.cc (fragment::has_null_terminator): Move STRING_CST
handling to fragment::string_cst_has_null_terminator; also use it to
handle INIT_VAL(STRING_REG).
(fragment::string_cst_has_null_terminator): New, from above.
David Malcolm [Thu, 24 Aug 2023 14:24:38 +0000 (10:24 -0400)]
analyzer: reimplement kf_strcpy [PR105899]
This patch reimplements the analyzer's implementation of strcpy using
the region_model::scan_for_null_terminator infrastructure, so that e.g.
it can complain about out-of-bounds reads/writes, unterminated strings,
etc.
gcc/analyzer/ChangeLog:
PR analyzer/105899
* kf.cc (kf_strcpy::impl_call_pre): Reimplement using
check_for_null_terminated_string_arg.
* region-model.cc (region_model::get_store_bytes): Shortcut
reading all of a string_region.
(region_model::scan_for_null_terminator): Use get_store_value for
the bytes rather than "unknown" when returning an unknown length.
(region_model::write_bytes): New.
* region-model.h (region_model::write_bytes): New decl.
gcc/testsuite/ChangeLog:
PR analyzer/105899
* gcc.dg/analyzer/out-of-bounds-diagram-16.c: New test.
* gcc.dg/analyzer/strcpy-1.c: Add test coverage.
* gcc.dg/analyzer/strcpy-3.c: Likewise.
* gcc.dg/analyzer/strcpy-4.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Richard Biener [Thu, 24 Aug 2023 07:32:54 +0000 (09:32 +0200)]
tree-optimization/111123 - indirect clobbers thrown away too early
The testcase in the PR shows that late uninit diagnostic relies
on indirect clobbers in CTORs but we throw those away in the fab
pass which is too early. The reasoning was they were supposed
to keep SSA names live but that's no longer the case since DCE
doesn't treat them as keeping SSA uses live.
The following instead removes them before out-of-SSA coalescing
which is the thing that's still affected by them.
PR tree-optimization/111123
* tree-ssa-ccp.cc (pass_fold_builtins::execute): Do not
remove indirect clobbers here ...
* tree-outof-ssa.cc (rewrite_out_of_ssa): ... but here.
(remove_indirect_clobbers): New function.
* g++.dg/warn/Wuninitialized-pr111123-1.C: New testcase.
Jan Hubicka [Thu, 24 Aug 2023 13:10:46 +0000 (15:10 +0200)]
Check that passes do not forget to define profile
This patch extends verifier to check that all probabilities and counts are
initialized if profile is supposed to be present. This is a bit complicated
by the posibility that we inline !flag_guess_branch_probability function
into function with profile defined and in this case we need to stop
verification. For this reason I added flag to cfg structure tracking this.
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
* cfg.h (struct control_flow_graph): New field full_profile.
* auto-profile.cc (afdo_annotate_cfg): Set full_profile to true.
* cfg.cc (init_flow): Set full_profile to false.
* graphite.cc (graphite_transform_loops): Set full_profile to false.
* lto-streamer-in.cc (input_cfg): Initialize full_profile flag.
* predict.cc (pass_profile::execute): Set full_profile to true.
* symtab-thunks.cc (expand_thunk): Set full_profile to true.
* tree-cfg.cc (gimple_verify_flow_info): Verify that profile is full
if full_profile is set.
* tree-inline.cc (initialize_cfun): Initialize full_profile.
(expand_call_inline): Combine full_profile.
Paul Dreik [Thu, 24 Aug 2023 10:43:43 +0000 (11:43 +0100)]
libstdc++: fix illegal pointer arithmetic in format [PR111102]
When parsing a format string, the width is parsed into an unsigned short
but the result is not checked in the case the format string is not a
char string (such as a wide string). In case the parse fails, a null
pointer is returned which is used for pointer arithmetic which is
undefined behaviour.
Signed-off-by: Paul Dreik <gccpatches@pauldreik.se>
libstdc++-v3/ChangeLog:
PR libstdc++/111102
* include/std/format (__format::__parse_integer): Check for
non-null pointer.
Jonathan Wakely [Thu, 17 Aug 2023 23:13:51 +0000 (00:13 +0100)]
libstdc++: Tweak some preprocessor conditions for feature tests
Update a preprocessor condition using __cplusplus and _GLIBCXX_HOSTED
to use the relevant feature test macro for <syncstream>.
Also add comments to some conditions saying which C++ standard revision
the check corresponds to.
libstdc++-v3/ChangeLog:
* include/std/atomic: Add comment to #ifdef and fix indentation.
* include/std/ostream: Check __glibcxx_syncbuf instead of
__cplusplus and _GLIBCXX_HOSTED.
* include/std/thread: Add comment to #ifdef.
Jonathan Wakely [Wed, 23 Aug 2023 14:51:49 +0000 (15:51 +0100)]
libstdc++: Implement new SI prefixes in <ratio> for C++23 (P2734R0)
This is a no-op for libstdc++, because our intmax_t is a 64-bit type and
so is incapable of representing the largest and smallest ratios from
C++11, let alone the new ones. I've added them to the file anyway (and
defined the feature test macro) so that if somebody ports libstdc++ to a
target with 128-bit intmax_t then they'll be present.
Richard Biener [Thu, 24 Aug 2023 11:46:12 +0000 (13:46 +0200)]
Fix confusion about load_p in vect_build_slp_tree_1
load_p is set and used as to whether the stmt is a memory operation,
not whether it is only a load. The following renames it to ldst_p
to avoid this confusion. It also replaces checking for a VUSE
with checking STMT_VINFO_DATA_REF since VUSE checking doesn't
work for pattern matched stores where no virtual operands are
present. Where we want to distinguish between loads and stores
we then check DR_IS_READ/WRITE.
I've made a classification mistake with .MASK_STORE support and
this hits other complications when dealing with single-lane SLP.
* tree-vect-slp.cc (vect_build_slp_tree_1): Rename
load_p to ldst_p, fix mistakes and rely on
STMT_VINFO_DATA_REF.
Jonathan Wakely [Wed, 23 Aug 2023 11:10:16 +0000 (12:10 +0100)]
libstdc++: Add pretty printer for std::locale
Print the locale's name, except when it uses the same named C locale for
all categories except one, in which case print something like:
std::locale = "en_GB.UTF-8" with "LC_CTYPE=en_US.UTF-8"
libstdc++-v3/ChangeLog:
* python/libstdcxx/v6/printers.py (StdLocalePrinter): New
printer class.
* testsuite/libstdc++-prettyprinters/locale.cc: New test.
Jonathan Wakely [Tue, 22 Aug 2023 13:26:51 +0000 (14:26 +0100)]
libstdc++: Declutter std::optional and std:variant pretty printers [PR110944]
As the PR says, including the template arguments in the GDB output of
these class templates can result in very long names, especially for
std::variant. You can use 'whatis' or other GDB commands to get details
of the type, we don't need to include it in the value.
We could consider including the type if it's not too long, but I think
consistency is better (and we already omit the template arguments for
std::vector and other class templates).
libstdc++-v3/ChangeLog:
PR libstdc++/110944
* python/libstdcxx/v6/printers.py (StdExpOptionalPrinter): Do
not show template arguments.
(StdVariantPrinter): Likewise.
* testsuite/libstdc++-prettyprinters/compat.cc: Adjust expected
output.
* testsuite/libstdc++-prettyprinters/cxx17.cc: Likewise.
* testsuite/libstdc++-prettyprinters/libfundts.cc: Likewise.
This patch is depending on middle-end patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627621.html
We already had COND_LEN_FNMA/COND_LEN_FMS/COND_FNMS patterns.
Remove TARGET_PREFERRED_ELSE_VALUE since it forbid the COND_LEN_FMS/COND_LEN_FNMS STMT fold.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_preferred_else_value): Remove it since
it forbid COND_LEN_FMS/COND_LEN_FNMS STMT fold.
(TARGET_PREFERRED_ELSE_VALUE): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Adapt test.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-9.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-9.c: New test.
Robin Dapp [Fri, 18 Aug 2023 13:57:16 +0000 (15:57 +0200)]
RISC-V: Enable pressure-aware scheduling by default.
this patch enables pressure-aware scheduling for riscv. There have been
various requests for it so I figured I'd just go ahead and send
the patch.
There is some slight regression in code quality for a number of
vector tests where we spill more due to different instructions order.
The ones I looked at were a mix of bad luck and/or brittle tests.
Comparing the size of the generated assembly or the number of vsetvls
for SPECint also didn't show any immediate benefit but that's obviously
not a very fine-grained analysis.
As cost and scheduling models mature I expect the situation to improve
and for now I think it's generally favorable to enable pressure-aware
scheduling so we can work with it rather than trying to find every
possible problem in advance.
Robin Dapp [Tue, 15 Aug 2023 15:15:58 +0000 (17:15 +0200)]
RISC-V: Fix reduc_strict_run-1 test case.
This patch fixes the reduc_strict_run-1 testcase by introducing
a variable that holds the reference result. This is necessary
because in presence of _Float16 emulation an intermediate
result used in a comparison is computed in higher precision.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c:
Add variable to hold reference result.
Richard Biener [Thu, 24 Aug 2023 09:10:43 +0000 (11:10 +0200)]
tree-optimization/111125 - avoid BB vectorization in novector loops
When a loop is marked with
#pragma GCC novector
the following makes sure to also skip BB vectorization for contained
blocks. That avoids gcc.dg/vect/bb-slp-29.c failing on aarch64
because of extra BB vectorization therein. I'm not specifically
dealing with sub-loops of novector loops, the desired semantics
isn't documented.
PR tree-optimization/111125
* tree-vect-slp.cc (vect_slp_function): Split at novector
loop entry, do not push blocks in novector loops.
[[]] attributes are a recent addition to C, but as a GNU extension,
GCC allows them to be used in C11 and earlier. Normally this use
would trigger a pedwarn (for -pedantic, -Wc11-c2x-compat, etc.).
This patch allows the pedwarn to be suppressed by starting the
attribute-list with __extension__.
Also, :: is not a single lexing token prior to C2X, so it wasn't
possible to use scoped attributes in C11, even as a GNU extension.
The patch allows two colons to be used in place of :: when
__extension__ is used. No attempt is made to check whether the
two colons are immediately adjacent.
gcc/
* doc/extend.texi: Document the C [[__extension__ ...]] construct.
gcc/c/
* c-parser.cc (c_parser_std_attribute): Conditionally allow
two colons to be used in place of ::.
(c_parser_std_attribute_list): New function, split out from...
(c_parser_std_attribute_specifier): ...here. Allow the attribute-list
to start with __extension__. When it does, also allow two colons
to be used in place of ::.
gcc/testsuite/
* gcc.dg/c2x-attr-syntax-6.c: New test.
* gcc.dg/c2x-attr-syntax-7.c: Likewise.
Juzhe-Zhong [Tue, 22 Aug 2023 01:58:34 +0000 (09:58 +0800)]
gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold
Hi, Richard and Richi.
Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math.
It's supported in tree-ssa-math-opts.cc. However, GCC failed to support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS.
Consider this following case:
__attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, \
TYPE *__restrict a, \
TYPE *__restrict b, int n) \
{ \
for (int i = 0; i < n; i++) \
dst[i] -= a[i] * b[i]; \
}
Richard Biener [Wed, 23 Aug 2023 12:28:26 +0000 (14:28 +0200)]
tree-optimization/111115 - SLP of masked stores
The following adds the capability to do SLP on .MASK_STORE, I do not
plan to add interleaving support.
PR tree-optimization/111115
gcc/
* tree-vectorizer.h (vect_slp_child_index_for_operand): New.
* tree-vect-data-refs.cc (can_group_stmts_p): Also group
.MASK_STORE.
* tree-vect-slp.cc (arg3_arg2_map): New.
(vect_get_operand_map): Handle IFN_MASK_STORE.
(vect_slp_child_index_for_operand): New function.
(vect_build_slp_tree_1): Handle statements with no LHS,
masked store ifns.
(vect_remove_slp_scalar_calls): Likewise.
* tree-vect-stmts.cc (vect_check_store_rhs): Lookup the
SLP child corresponding to the ifn value index.
(vectorizable_store): Likewise for the mask index. Support
masked stores.
(vectorizable_load): Lookup the SLP child corresponding to the
ifn mask index.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_masked_store):
Supported with check_avx_available.
* gcc.dg/vect/slp-mask-store-1.c: New testcase.
We assume that all root stmts which compose the total reduction chain
are vectorized but fail to account for the cost of adding back the
scalar defs we are not vectorizing. The following rectifies this,
fixing the gcc.dg/tree-ssa/slsr-11.c FAIL on aarch64.
PR tree-optimization/111125
* tree-vect-slp.cc (vectorizable_bb_reduc_epilogue): Account
for the remain_defs processing.
aarch64: Account for different Advanced SIMD fusing options
The scalar FNMADD/FNMSUB and SVE FNMLA/FNMLS instructions mean
that either side of a subtraction can start an accumulator chain.
However, Advanced SIMD doesn't have an equivalent instruction.
This means that, for Advanced SIMD, a subtraction can only be
fused if the second operand is a multiplication.
Also, if both sides of a subtraction are multiplications,
and if the second operand is used multiple times, such as:
c * d - a * b
e * f - a * b
then the first rather than second multiplication operand will tend
to be fused. On Advanced SIMD, this leads to:
tmp1 = a * b
tmp2 = -tmp1
... = tmp2 + c * d // FMLA
... = tmp2 + e * f // FMLA
where one of the FMLAs also requires a MOV.
This patch tries to account for this in the vector cost model.
It improves roms performance by 2-3% on Neoverse V1. It's also
needed to avoid a regression in fotonik for Neoverse N2 and
Neoverse V2 with the patch for PR110625.
gcc/
* config/aarch64/aarch64.cc: Include ssa.h.
(aarch64_multiply_add_p): Require the second operand of an
Advanced SIMD subtraction to be a multiplication. Assume that
such an operation won't be fused if the second operand is used
multiple times and if the first operand is also a multiplication.
gcc/testsuite/
* gcc.target/aarch64/neoverse_v1_2.c: New test.
* gcc.target/aarch64/neoverse_v1_3.c: Likewise.
The following fixes placement of shift operand sanitization with
MIN when the original shift operand was external but the actual
one is not.
PR tree-optimization/111128
* tree-vect-patterns.cc (vect_recog_over_widening_pattern):
Emit external shift operand inline if we promoted it with
another pattern stmt.
Richard Biener [Thu, 24 Aug 2023 08:55:06 +0000 (10:55 +0200)]
testsuite/111125 - disable BB vectorization for the test
The test is for loop vectorization producing non-canonical
multiplications. We can now BB vectorize the whole function
when the target supports .REDUC_PLUS for V2SImode but we don't
have a dejagnu selector for that. Disable BB vectorization
like we disabled epilogue vectorization.
Andrew Pinski [Wed, 23 Aug 2023 16:46:10 +0000 (16:46 +0000)]
MATCH: [PR111109] Fix bit_ior(cond,cond) when comparisons are fp
The patterns that were added in r13-4620-g4d9db4bdd458, missed that
(a > b) and (a <= b) are not inverse of each other for floating point
comparisons (if NaNs are supported). Even though there was a check for
intergal types, it was only for the result of the cond rather for the
type of what is being compared. The fix is to check to see if cmp and
icmp are inverse of each other by using the invert_tree_comparison function.
OK for trunk and GCC 13 branch? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
I added the testcase to execute/ieee as it requires support for NAN.
PR tree-optimization/111109
gcc/ChangeLog:
* match.pd (ior(cond,cond), ior(vec_cond,vec_cond)):
Add check to make sure cmp and icmp are inverse.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/ieee/fp-cmp-cond-1.c: New test.
Andrew Pinski [Wed, 23 Aug 2023 01:41:56 +0000 (18:41 -0700)]
MATCH: remove negate for 1bit types
For 1bit types, negate is either undefined or don't change the value.
In either cases we want to remove them.
This patch adds a match pattern to do that.
Also converting to a 1bit type we can remove the negate just like we already do
for `&1` so this patch adds that too.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Notes on the testcases:
This patch is the last part to fix PR 95929; cond-bool-2.c testcase.
bit1neg-1.c is a 1bit-field testcase where we could remove the assignment
all the way in one case (which happened on the RTL level for some targets but not all).
cond-bool-2.c is the reduced testcase of PR 95929.
PR tree-optimization/95929
gcc/ChangeLog:
* match.pd (convert?(-a)): New pattern
for 1bit integer types.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/bit1neg-1.c: New test.
* gcc.dg/tree-ssa/cond-bool-1.c: New test.
* gcc.dg/tree-ssa/cond-bool-2.c: New test.
Richard Biener [Mon, 21 Aug 2023 08:34:30 +0000 (10:34 +0200)]
debug/111080 - avoid outputting debug info for unused restrict qualified type
The following applies some maintainance with respect to type qualifiers
and kinds added by later DWARF standards to prune_unused_types_walk.
The particular case in the bug is not handling (thus marking required)
all restrict qualified type DIEs. I've found more DW_TAG_*_type that
are unhandled, looked up the DWARF docs and added them as well based
on common sense.
PR debug/111080
* dwarf2out.cc (prune_unused_types_walk): Handle
DW_TAG_restrict_type, DW_TAG_shared_type, DW_TAG_atomic_type,
DW_TAG_immutable_type, DW_TAG_coarray_type, DW_TAG_unspecified_type
and DW_TAG_dynamic_type as to only output them when referenced.
liuhongt [Tue, 22 Aug 2023 10:18:31 +0000 (18:18 +0800)]
Fix target_clone ("arch=graniterapids-d") and target_clone ("arch=arrowlake-s")
Both "graniterapid-d" and "graniterapids" are attached with
PROCESSOR_GRANITERAPID in processor_alias_table but mapped to
different __cpu_subtype in get_intel_cpu.
And get_builtin_code_for_version will try to match the first
PROCESSOR_GRANITERAPIDS in processor_alias_table which maps to
"granitepraids" here.
861 else if (new_target->arch_specified && new_target->arch > 0)
1862 for (i = 0; i < pta_size; i++)
1863 if (processor_alias_table[i].processor == new_target->arch)
1864 {
1865 const pta *arch_info = &processor_alias_table[i];
1866 switch (arch_info->priority)
1867 {
1868 default:
1869 arg_str = arch_info->name;
This mismatch makes dispatch_function_versions check the preidcate
of__builtin_cpu_is ("graniterapids") for "graniterapids-d" and causes
the issue.
The patch explicitly adds PROCESSOR_ARROWLAKE_S and
PROCESSOR_GRANITERAPIDS_D to make a distinction.
For "alderlake","raptorlake", "meteorlake" they share same isa, cost,
tuning, and mapped to the same __cpu_type/__cpu_subtype in
get_intel_cpu, so no need to add PROCESSOR_RAPTORLAKE and others.
gcc/ChangeLog:
* common/config/i386/i386-common.cc (processor_names): Add new
member graniterapids-s and arrowlake-s.
* config/i386/i386-options.cc (processor_alias_table): Update
table with PROCESSOR_ARROWLAKE_S and
PROCESSOR_GRANITERAPIDS_D.
(m_GRANITERAPID_D): New macro.
(m_ARROWLAKE_S): Ditto.
(m_CORE_AVX512): Add m_GRANITERAPIDS_D.
(processor_cost_table): Add icelake_cost for
PROCESSOR_GRANITERAPIDS_D and alderlake_cost for
PROCESSOR_ARROWLAKE_S.
* config/i386/x86-tune.def: Hanlde m_ARROWLAKE_S same as
m_ARROWLAKE.
* config/i386/i386.h (enum processor_type): Add new member
PROCESSOR_GRANITERAPIDS_D and PROCESSOR_ARROWLAKE_S.
* config/i386/i386-c.cc (ix86_target_macros_internal): Handle
PROCESSOR_GRANITERAPIDS_D and PROCESSOR_ARROWLAKE_S
Of particular interest is the value in a0 when we call consume. We compute that
horribly inefficiently. If we back-substitute from the final assignment to a0
we get...
That's a pretty convoluted way to compute sp - 3990616.
Something like this would be notably better (not great, but we need both the
stack adjustment and the address of the object to pass to consume):
addi sp,sp,-16
sd ra,8(sp)
li t0,-4001792
addi t0,t0,1792
add sp,sp,t0
li a0,4096
addi a0,a0,-96
add a0,sp,a0
call consume
The problem is LRA's elimination code is not handling the case where we have
(plus (reg1) (reg2) where reg1 is an eliminable register and reg2 has a known
equivalency, particularly a constant.
If we can determine that reg2 is equivalent to a constant and treat (plus
(reg1) (reg2)) in the same way we'd treat (plus (reg1) (const_int)) then we can
get the desired code.
This eliminates about 19b instructions, or roughly 1% for deepsjeng on rv64.
There are improvements elsewhere, but they're relatively small. This may
ultimately lessen the value of Manolis's fold-mem-offsets patch. So we'll have
to evaluate that again once he posts a new version.
Bootstrapped and regression tested on x86_64 as well as bootstrapped on rv64.
Earlier versions have been tested against spec2017. Pre-approved by Vlad in a
private email conversation (thanks Vlad!).
Committed to the trunk,
gcc/
* lra-eliminations.cc (eliminate_regs_in_insn): Use equivalences to
to help simplify code further.
Andrew MacLeod [Thu, 17 Aug 2023 16:34:59 +0000 (12:34 -0400)]
Phi analyzer - Initialize with range instead of a tree.
Rangers PHI analyzer currently only allows a single initializer to a group.
This patch changes that to use an inialization range, which is
cumulative of all integer constants, plus a single symbolic value.
There is no other change to group functionality.
This patch also changes the way PHI groups are printed so they show up in the
listing as they are encountered, rather than as a list at the end. It
was more difficult to see what was going on previously.
PR tree-optimization/110918 - Initialize with range instead of a tree.
gcc/
* gimple-range-fold.cc (fold_using_range::range_of_phi): Tweak output.
* gimple-range-phi.cc (phi_group::phi_group): Remove unused members.
Initialize using a range instead of value and edge.
(phi_group::calculate_using_modifier): Use initializer value and
process for relations after trying for iteration convergence.
(phi_group::refine_using_relation): Use initializer range.
(phi_group::dump): Rework the dump output.
(phi_analyzer::process_phi): Allow multiple constant initilizers.
Dump groups immediately as created.
(phi_analyzer::dump): Tweak output.
* gimple-range-phi.h (phi_group::phi_group): Adjust prototype.
(phi_group::initial_value): Delete.
(phi_group::refine_using_relation): Adjust prototype.
(phi_group::m_initial_value): Delete.
(phi_group::m_initial_edge): Delete.
(phi_group::m_vr): Use int_range_max.
* tree-vrp.cc (execute_ranger_vrp): Don't dump phi groups.
Andrew MacLeod [Wed, 16 Aug 2023 17:23:06 +0000 (13:23 -0400)]
Don't process phi groups with one phi.
The phi analyzer should not create a phi group containing a single phi.
* gimple-range-phi.cc (phi_analyzer::operator[]): Return NULL if
no group was created.
(phi_analyzer::process_phi): Do not create groups of one phi node.
Uros Bizjak [Wed, 23 Aug 2023 14:39:21 +0000 (16:39 +0200)]
i386: Fix register spill failure with concat RTX [PR111010]
Disable (=&r,m,m) alternative for 32-bit targets. The combination of two
memory operands (possibly with complex addressing mode), early clobbered
output, frame pointer and PIC registers uses too much registers on
a register constrained 32-bit target.
Also merge two similar patterns using DWIH mode iterator.
PR target/111010
gcc/ChangeLog:
* config/i386/i386.md (*concat<any_or_plus:mode><dwi>3_3):
Merge pattern from *concatditi3_3 and *concatsidi3_3 using
DWIH mode iterator. Disable (=&r,m,m) alternative for
32-bit targets.
(*concat<any_or_plus:mode><dwi>3_3): Disable (=&r,m,m)
alternative for 32-bit targets.
Zhangjin Liao [Wed, 23 Aug 2023 14:02:47 +0000 (08:02 -0600)]
[PATCH] RISC-V:add a more appropriate type attribute
Due to the more accurate type attribute added to the clz, ctz, and pcnt
operations in https://github.com/gcc-mirror/gcc/commit/07e2576d6f3 the
same type attribute should be used here.
gcc/ChangeLog:
* config/riscv/bitmanip.md (*<bitmanip_optab>disi2_sext): Add a more
appropriate type attribute.
This patch add conditional unary neg/abs/not autovec patterns to RISC-V backend.
For this C code:
void
test_3 (float *__restrict a, float *__restrict b, int *__restrict pred, int n)
{
for (int i = 0; i < n; i += 1)
{
a[i] = pred[i] ? __builtin_fabsf (b[i]) : a[i];
}
}
Before this patch:
...
vsetvli a7,zero,e32,m1,ta,ma
vfabs.v v2,v2
vmerge.vvm v1,v1,v2,v0
...
After this patch:
...
vsetvli a7,zero,e32,m1,ta,mu
vfabs.v v1,v2,v0.t
...
For int neg/not and FP neg patterns, Defining the corresponding cond_xxx paterns
is enough.
For the FP abs pattern, We need to change the definition of `abs<mode>2` and
`@vcond_mask_<mode><vm>` pattern from define_expand to define_insn_and_split
in order to fuse them into a new pattern `*cond_abs<mode>` at the combine pass.
A fusion process similar to the one below:
* gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-8.c: New test.
Jan Hubicka [Wed, 23 Aug 2023 09:17:20 +0000 (11:17 +0200)]
Fix handling of static exists in loop_ch
This patch fixes wrong return value in should_duplicate_loop_header_p.
Doing so uncovered suboptimal decisions on some jump threading testcases
where we choose to stop duplicating just before basic block that has zero
cost and duplicating so would be always a win.
This is because the heuristics trying to choose right point to duplicate
all winning blocks and to get loop to be do_while did not account
zero_cost blocks in all cases. The patch simplifies the logic by
simply remembering zero cost blocks and handling them last after
the right stopping point is chosen.
gcc/ChangeLog:
* tree-ssa-loop-ch.cc (enum ch_decision): Fix comment.
(should_duplicate_loop_header_p): Fix return value for static exits.
(ch_base::copy_headers): Improve handling of ch_possible_zero_cost.
Lulu Cheng [Tue, 22 Aug 2023 11:56:21 +0000 (19:56 +0800)]
libffi: Backport of LoongArch support for libffi.
This is a backport of <https://github.com/libffi/libffi/commit/f259a6f6de>,
and contains modifications to commit 5a4774cd4d, as well as the LoongArch
schema portion of commit ee22ecbd11. This is needed for libgo.
libffi/ChangeLog:
PR libffi/108682
* configure.host: Add LoongArch support.
* Makefile.am: Likewise.
* Makefile.in: Regenerate.
* src/loongarch64/ffi.c: New file.
* src/loongarch64/ffitarget.h: New file.
* src/loongarch64/sysv.S: New file.
Kewen Lin [Wed, 23 Aug 2023 05:09:14 +0000 (00:09 -0500)]
vect: Move VMAT_GATHER_SCATTER handlings from final loop nest
Like r14-3317 which moves the handlings on memory access
type VMAT_GATHER_SCATTER in vectorizable_load final loop
nest, this one is to deal with vectorizable_store side.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_store): Move the handlings on
VMAT_GATHER_SCATTER in the final loop nest to its own loop,
and update the final nest accordingly.
Kewen Lin [Wed, 23 Aug 2023 05:09:14 +0000 (00:09 -0500)]
vect: Move VMAT_LOAD_STORE_LANES handlings from final loop nest
Like commit r14-3214 which moves the handlings on memory
access type VMAT_LOAD_STORE_LANES in vectorizable_load
final loop nest, this one is to deal with the function
vectorizable_store.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_store): Move the handlings on
VMAT_LOAD_STORE_LANES in the final loop nest to its own loop,
and update the final nest accordingly.
Kewen Lin [Wed, 23 Aug 2023 05:09:14 +0000 (00:09 -0500)]
vect: Remove some manual release in vectorizable_store
To avoid some duplicates in some follow-up patches on
function vectorizable_store, this patch is to adjust some
existing vec with auto_vec and remove some manual release
invocation. Also refactor a bit and remove some uesless
codes.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_store): Remove vec oprnds,
adjust vec result_chain, vec_oprnd with auto_vec, and adjust
gvec_oprnds with auto_delete_vec.