Robert Dubner [Mon, 21 Jul 2025 16:58:47 +0000 (12:58 -0400)]
cobol: Improved linemap and diagnostic handling; PIC validation. [PR120402]
Implementation of PICTURE string validation for PR120402. Expanded some printf
format attributes. Improved debugging and diagnostic messages. Improved
linemap and line location tracking in support of diagnostic messages and
location_t tagging of GENERIC nodes for improved GDB-COBOL performance.
Assorted changes to eliminate cppcheck warnings.
Co-Authored-By: James K. Lowden <jklowden@cobolworx.com> Co-Authored-By: Robert Dubner <rdubner@symas.com>
gcc/cobol/ChangeLog:
PR cobol/120402
* Make-lang.in: Elminate commented-out scripting.
* cbldiag.h (_CBLDIAG_H): Change #if 0 to #if GCOBOL_GETENV
(warn_msg): Add printf attributes.
(location_dump): Add debugging message.
* cdf.y: Improved linemap tracking.
* genapi.cc (treeplet_fill_source): const attribute for formal parameter.
(insert_nop): Created to consolidate var_decl_nop writes.
(build_main_that_calls_something): Move generation to the end of executable.
(level_88_helper): Formatting.
(parser_call_targets_dump): Formatting.
(function_pointer_from_name): const attribute for formal parameter.
(parser_initialize_programs): const attribute for formal parameter.
(parser_statement_begin): Improved linemap handling.
(section_label): Improved linemap handling.
(paragraph_label): Improved linemap handling.
(pseudo_return_pop): Improved linemap handling.
(leave_procedure): Formatting.
(parser_enter_section): Improved linemap handling.
(parser_enter_paragraph): Improved linemap handling.
(parser_perform): Formatting.
(parser_leave_file): Move creation of main() to this routine.
(parser_enter_program): Move creation of main from here to leave_file.
(parser_accept): Formatting. const attribute for formal parameter.
(parser_accept_command_line): const attribute for formal parameter.
(parser_accept_command_line_count): const attribute for formal parameter.
(parser_accept_envar): Likewise.
(parser_set_envar): Likewise.
(parser_display): Likewise.
(get_exhibit_name): Implement EXHIBIT verb.
(parser_exhibit): Likewise.
(parser_sleep): const attribute for formal parameter.
(parser_division): Improved linemap handling.
(parser_classify): const attribute for formal parameter.
(create_iline_address_pairs): Improved linemap handling.
(parser_perform_start): Likewise.
(perform_inline_until): Likewise.
(perform_inline_testbefore_varying): Likewise.
(parser_perform_until): Likewise.
(parser_perform_inline_times): Likewise.
(parser_intrinsic_subst): const attribute for formal parameter.
(parser_file_merge): Formatting.
(create_and_call): Improved linemap handling.
(mh_identical): const attribute for formal parameter.
(mh_numeric_display): const attribute for formal parameter.
(mh_little_endian): Likewise.
(mh_source_is_group): Likewise.
(psa_FldLiteralA): Formatting.
* genapi.h (parser_accept): const attribute for formal parameter.
(parser_accept_envar): Likewise.
(parser_set_envar): Likewise.
(parser_accept_command_line): Likewise.
(parser_accept_command_line_count): Likewise.
(parser_add): Likewise.
(parser_classify): Likewise.
(parser_sleep): Likewise.
(parser_exhibit): Likewise.
(parser_display): Likewise.
(parser_initialize_programs): Likewise.
(parser_intrinsic_subst): Likewise.
* gengen.cc (gg_assign): Improved linemap handling.
(gg_add_field_to_structure): Likewise.
(gg_define_from_declaration): Likewise.
(gg_build_relational_expression): Likewise.
(gg_goto_label_decl): Likewise.
(gg_goto): Likewise.
(gg_printf): Likewise.
(gg_fprintf): Likewise.
(gg_memset): Likewise.
(gg_memchr): Likewise.
(gg_memcpy): Likewise.
(gg_memmove): Likewise.
(gg_strcpy): Likewise.
(gg_strcmp): Likewise.
(gg_strncmp): Likewise.
(gg_return): Likewise.
(chain_parameter_to_function): Likewise.
(gg_define_function): Likewise.
(gg_get_function_decl): Likewise.
(gg_call_expr): Likewise.
(gg_call): Likewise.
(gg_call_expr_list): Likewise.
(gg_exit): Likewise.
(gg_abort): Likewise.
(gg_strlen): Likewise.
(gg_strdup): Likewise.
(gg_malloc): Likewise.
(gg_realloc): Likewise.
(gg_free): Likewise.
(gg_set_current_line_number): Likewise.
(gg_get_current_line_number): Likewise.
(gg_insert_into_assembler): Likewise.
(token_location_override): Likewise.
(gg_token_location): Likewise.
* gengen.h (location_from_lineno): Likewise.
(gg_set_current_line_number): Likewise.
(gg_get_current_line_number): Likewise.
(gg_token_location): Likewise.
(current_token_location): Likewise.
(current_location_minus_one): Likewise.
(current_location_minus_one_clear): Likewise.
(token_location_override): Likewise.
* genmath.cc (fast_divide): const attribute for formal parameter.
* genutil.cc (get_and_check_refstart_and_reflen): Likewise.
(get_data_offset): Likewise.
(refer_refmod_length): Likewise.
(refer_offset): Likewise.
(refer_size): Likewise.
(refer_size_dest): Likewise.
(refer_size_source): Likewise.
(qualified_data_location): Likewise.
* genutil.h (refer_offset): Likewise.
(refer_size_source): Likewise.
(refer_size_dest): Likewise.
(qualified_data_location): Likewise.
* parse.y: EVALUATE token; Implement EXHIBIT verb;
Improved linemap handling.
* parse_ante.h (input_file_status_notify): Improved linemap handling.
(location_set): Likewise.
* scan.l: PICTURE string validation.
* scan_ante.h (class picture_t): PICTURE string validation.
(validate_picture): Likewise.
* symbols.cc (symbol_currency): Revised default currency handling.
* symbols.h (symbol_currency): Likewise.
* util.cc (location_from_lineno): Improved linemap handling.
(current_token_location): Improved linemap handling.
(current_location_minus_one): Improved linemap handling.
(current_location_minus_one_clear): Improved linemap handling.
(gcc_location_set_impl): Improved linemap handling.
(warn_msg): Improved linemap handling.
* util.h (cobol_lineno): Improved linemap handling.
Andrew Pinski [Sun, 20 Jul 2025 18:21:08 +0000 (11:21 -0700)]
match: Add `cmp - 1` simplification to `-icmp` [PR110949]
I have seen this a few places though the testcase from PR 95906
is an obvious place where this shows up for sure.
This convert `cmp - 1` into `-icmp` as that form is more useful
in many cases.
Changes since v1:
* v2: Add check for outer type's precision being greater than 1.
libstdc++: Strengthen exception guarantee for mdspan methods.
The mdspan::is_{,always}_{unique,strided,exhaustive} methods only call
their counterparts in mdspan::mapping_type. The standard specifies that
the methods of mdspan::mapping_type are noexcept, but doesn't specify if
the methods of mdspan are noexcept.
Libc++ strengthened the exception guarantee for these mdspan methods.
This commit conditionally strengthens these methods for libstdc++.
libstdc++-v3/ChangeLog:
* include/std/mdspan (mdspan::is_always_unique): Make
conditionally noexcept.
(mdspan::is_always_exhaustive): Ditto.
(mdspan::is_always_strided): Ditto.
(mdspan::is_unique): Ditto.
(mdspan::is_exhaustive): Ditto.
(mdspan::is_strided): Ditto.
* testsuite/23_containers/mdspan/layout_like.h: Make noexcept
configurable. Add ThrowingLayout.
* testsuite/23_containers/mdspan/mdspan.cc: Add tests for
noexcept.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
Pan Li [Mon, 21 Jul 2025 01:13:27 +0000 (09:13 +0800)]
RISC-V: Add test for vec_duplicate + vaaddu.vv combine case 0 with GR2VR cost 0, 2 and 15 for QI, HI and SI mode
Add asm dump check and run test for vec_duplicate + vaaddu.vv
combine to vaaddu.vx, with the GR2VR cost is 0, 2 and 15. Please
note DImode is not included here.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-1-u8.c: New test.
Pan Li [Mon, 21 Jul 2025 01:06:52 +0000 (09:06 +0800)]
RISC-V: Combine vec_duplicate + vaaddu.vv to vaaddu.vx on GR2VR cost for HI, QI and SI mode
This patch would like to combine the vec_duplicate + vaaddu.vv to the
vaaddu.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
aarch64: Avoid INS-(W|X)ZR instructions when optimising for speed
For inserting zero into a vector lane we usually use an instruction like:
ins v0.h[2], wzr
This, however, has not-so-great performance on some CPUs.
On Grace, for example it has a latency of 5 and throughput 1.
The alternative sequence:
movi v31.8b, #0
ins v0.h[2], v31.h[0]
is prefereble bcause the MOVI-0 is often a zero-latency operation that is
eliminated by the CPU frontend and the lane-to-lane INS has a latency of 2 and
throughput of 4.
We can avoid the merging of the two instructions into the aarch64_simd_vec_set_zero<mode>
by disabling that pattern when optimizing for speed.
Thanks to wider benchmarking from Tamar, it makes sense to make this change for
all tunings, so no RTX costs or tuning flags are introduced to control this
in a more fine-grained manner. They can be easily added in the future if needed
for a particular CPU.
Bootstrapped and tested on aarch64-none-linux-gnu.
aarch64: NFC - Make vec_* rtx costing logic consistent
The rtx costs logic for CONST_VECTOR, VEC_DUPLICATE and VEC_SELECT sets
the cost unconditionally to the movi, dup or extract fields of extra_cost,
when the normal practice in that function is to use extra_cost only when speed
is set. When speed is false the function should estimate the size cost only.
This patch makes the logic consistent by using the extra_cost fields to
increment the cost when speed is set. This requires reducing the extra_cost values
of the movi, dup and extract fields by COSTS_N_INSNS (1), as every insn being costed
has a cost of COSTS_N_INSNS (1) at the start of the function. The cost tables for
the CPUs are updated in line with this.
With these changes the testsuite is unaffected so no different costing
decisions are made and this patch is just a cleanup.
Bootstrapped and tested on aarch64-none-linux-gnu.
Andrew Stubbs [Thu, 12 Jun 2025 16:58:33 +0000 (16:58 +0000)]
amdgcn: add DImode offsets for gather/scatter
Add new variant of he gather_load and scatter_store instructions that take the
offsets in DImode. This is not the natural width for offsets in the
instruction set, but we can use them to compute a vector of absolute addresses,
which does work.
This enables the autovectorizer to use gather/scatter in a number of additional
scenarios (one of which shows up in the SPEC HPC lbm benchmark).
Andrew Stubbs [Thu, 12 Jun 2025 16:54:01 +0000 (16:54 +0000)]
amdgcn: Add ashlvNm, mulvNm macros
I need some extra shift varieties in the mode-independent code, but the macros
don't permit insns that don't have QI/HI variants. This fixes the problem, and
adds the new functions for the follow-up patch to use.
gcc/ChangeLog:
* config/gcn/gcn.cc (GEN_VNM_NOEXEC): Use USE_QHF.
(GEN_VNM): Likewise, and call for new ashl and mul variants.
Andrew Stubbs [Thu, 12 Jun 2025 16:57:23 +0000 (16:57 +0000)]
amdgcn: add more insn patterns using vec_duplicate
These new insns allow more efficient use of scalar inputs to 64-bit vector
add and mul. Also, the patch adjusts the existing mul.._dup because it was
actually a dup2 (the vec_duplicate is on the second input), and that was
inconveniently inconsistent.
The patterns are generally useful, but will be used directly by a follow-up
patch.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (add<mode>3_dup): New.
(add<mode>3_dup_exec): New.
(<su>mul<mode>3_highpart_dup<exec>): New.
(mul<mode>3_dup): Move the vec_duplicate to operand 1.
(mul<mode>3_dup_exec): New.
(vec_series<mode>): Adjust call to gen_mul<mode>3_dup.
* config/gcn/gcn.cc (gcn_expand_vector_init): Likewise.
Since genoutput has no information about hard register names we cannot
statically verify those names in constraints of the machine description.
Therefore, we have to do it at runtime. Although verification shouldn't
be too expensive, restrict it to checking builds. This should be
sufficient since hard register constraints in machine descriptions
probably change rarely, and each commit should be tested with checking
anyway, or at the very least before a release is taken.
gcc/ChangeLog:
* genoutput.cc (main): Emit function
verify_reg_names_in_constraints() for run-time validation.
(mdep_constraint_len): Deal with hard register constraints.
* output.h (verify_reg_names_in_constraints): New function
declaration.
* toplev.cc (backend_init): If checking is enabled, call into
verify_reg_names_in_constraints().
This implements error handling for hard register constraints including
potential conflicts with register asm operands.
In contrast to register asm operands, hard register constraints allow
more than just one register per operand. Even more than just one
register per alternative. For example, a valid constraint for an
operand is "{r0}{r1}m,{r2}". However, this also means that we have to
make sure that each register is used at most once in each alternative
over all outputs and likewise over all inputs. For asm statements this
is done by this patch during gimplification. For hard register
constraints used in machine description, error handling is still a todo
and I haven't investigated this so far and consider this rather a low
priority.
gcc/ada/ChangeLog:
* gcc-interface/trans.cc (gnat_to_gnu): Pass null pointer to
parse_{input,output}_constraint().
gcc/analyzer/ChangeLog:
* region-model-asm.cc (region_model::on_asm_stmt): Pass null
pointer to parse_{input,output}_constraint().
gcc/c/ChangeLog:
* c-typeck.cc (build_asm_expr): Pass null pointer to
parse_{input,output}_constraint().
gcc/ChangeLog:
* cfgexpand.cc (n_occurrences): Move this ...
(check_operand_nalternatives): and this ...
(expand_asm_stmt): and the call to gimplify.cc.
* config/s390/s390.cc (s390_md_asm_adjust): Pass null pointer to
parse_{input,output}_constraint().
* gimple-walk.cc (walk_gimple_asm): Pass null pointer to
parse_{input,output}_constraint().
(walk_stmt_load_store_addr_ops): Ditto.
* gimplify-me.cc (gimple_regimplify_operands): Ditto.
* gimplify.cc (num_occurrences): Moved from cfgexpand.cc.
(num_alternatives): Ditto.
(gimplify_asm_expr): Deal with hard register constraints.
* stmt.cc (eliminable_regno_p): New helper.
(hardreg_ok_p): Perform a similar check as done in
make_decl_rtl().
(parse_output_constraint): Add parameter for gimplify_reg_info
and validate hard register constrained operands.
(parse_input_constraint): Ditto.
* stmt.h (class gimplify_reg_info): Forward declaration.
(parse_output_constraint): Add parameter.
(parse_input_constraint): Ditto.
* tree-ssa-operands.cc
(operands_scanner::get_asm_stmt_operands): Pass null pointer
to parse_{input,output}_constraint().
* tree-ssa-structalias.cc (find_func_aliases): Pass null pointer
to parse_{input,output}_constraint().
* varasm.cc (assemble_asm): Pass null pointer to
parse_{input,output}_constraint().
* gimplify_reg_info.h: New file.
gcc/cp/ChangeLog:
* semantics.cc (finish_asm_stmt): Pass null pointer to
parse_{input,output}_constraint().
gcc/d/ChangeLog:
* toir.cc: Pass null pointer to
parse_{input,output}_constraint().
gcc/testsuite/ChangeLog:
* gcc.dg/pr87600-2.c: Split test into two files since errors for
functions test{0,1} are thrown during expand, and for
test{2,3} during gimplification.
* lib/scanasm.exp: On s390, skip lines beginning with #.
* gcc.dg/asm-hard-reg-error-1.c: New test.
* gcc.dg/asm-hard-reg-error-2.c: New test.
* gcc.dg/asm-hard-reg-error-3.c: New test.
* gcc.dg/asm-hard-reg-error-4.c: New test.
* gcc.dg/asm-hard-reg-error-5.c: New test.
* gcc.dg/pr87600-3.c: New test.
* gcc.target/aarch64/asm-hard-reg-2.c: New test.
* gcc.target/s390/asm-hard-reg-7.c: New test.
Implement hard register constraints of the form {regname} where regname
must be a valid register name for the target. Such constraints may be
used in asm statements as a replacement for register asm and in machine
descriptions. A more verbose description is given in extend.texi.
It is expected and desired that optimizations coalesce multiple pseudos
into one whenever possible. However, in case of hard register
constraints we may have to undo this and introduce copies since
otherwise we would constraint a single pseudo to multiple hard
registers. This is done prior RA during asmcons in
match_asm_constraints_2(). While IRA tries to reduce live ranges, it
also replaces some register-register moves. That in turn might undo
those copies of a pseudo which we just introduced during asmcons. Thus,
check in decrease_live_ranges_number() via
valid_replacement_for_asm_input_p() whether it is valid to perform a
replacement.
The reminder of the patch mostly deals with parsing and decoding hard
register constraints. The actual work is done by LRA in
process_alt_operands() where a register filter, according to the
constraint, is installed.
For the sake of "reviewability" and in order to show the beauty of LRA,
error handling (which gets pretty involved) is spread out into a
subsequent patch.
Limitation
----------
Currently, a fixed register cannot be used as hard register constraint.
For example, loading the stack pointer on x86_64 via
Most of them only add the CC register to the list of clobbered register.
However, cris, i386, and s390 need some minor adjustment.
gcc/ChangeLog:
* config/cris/cris.cc (cris_md_asm_adjust): Deal with hard
register constraint.
* config/i386/i386.cc (map_egpr_constraints): Ditto.
* config/s390/s390.cc (f_constraint_p): Ditto.
* doc/extend.texi: Document hard register constraints.
* doc/md.texi: Ditto.
* function.cc (match_asm_constraints_2): Have a unique pseudo
for each operand with a hard register constraint.
(pass_match_asm_constraints::execute): Calling into new helper
match_asm_constraints_2().
* genoutput.cc (mdep_constraint_len): Return the length of a
hard register constraint.
* genpreds.cc (write_insn_constraint_len): Support hard register
constraints for insn_constraint_len().
* ira.cc (valid_replacement_for_asm_input_p_1): New helper.
(valid_replacement_for_asm_input_p): New helper.
(decrease_live_ranges_number): Similar to
match_asm_constraints_2() ensure that each operand has a unique
pseudo if constrained by a hard register.
* lra-constraints.cc (process_alt_operands): Install hard
register filter according to constraint.
* recog.cc (asm_operand_ok): Accept register type for hard
register constrained asm operands.
(constrain_operands): Validate hard register constraints.
* stmt.cc (decode_hard_reg_constraint): Parse a hard register
constraint into the corresponding register number or bail out.
(parse_output_constraint): Parse hard register constraint and
set *ALLOWS_REG.
(parse_input_constraint): Ditto.
* stmt.h (decode_hard_reg_constraint): Declaration of new
function.
gcc/testsuite/ChangeLog:
* gcc.dg/asm-hard-reg-1.c: New test.
* gcc.dg/asm-hard-reg-2.c: New test.
* gcc.dg/asm-hard-reg-3.c: New test.
* gcc.dg/asm-hard-reg-4.c: New test.
* gcc.dg/asm-hard-reg-5.c: New test.
* gcc.dg/asm-hard-reg-6.c: New test.
* gcc.dg/asm-hard-reg-7.c: New test.
* gcc.dg/asm-hard-reg-8.c: New test.
* gcc.target/aarch64/asm-hard-reg-1.c: New test.
* gcc.target/i386/asm-hard-reg-1.c: New test.
* gcc.target/i386/asm-hard-reg-2.c: New test.
* gcc.target/s390/asm-hard-reg-1.c: New test.
* gcc.target/s390/asm-hard-reg-2.c: New test.
* gcc.target/s390/asm-hard-reg-3.c: New test.
* gcc.target/s390/asm-hard-reg-4.c: New test.
* gcc.target/s390/asm-hard-reg-5.c: New test.
* gcc.target/s390/asm-hard-reg-6.c: New test.
* gcc.target/s390/asm-hard-reg-longdouble.h: New test.
Richard Biener [Mon, 21 Jul 2025 08:40:13 +0000 (10:40 +0200)]
Remove bougs minimum VF compute
The following removes the minimum VF compute from dataref analysis
which does not take into account SLP at all, leaving the testcase
vectorized with V2SImode instead of V4SImode on x86. With SLP
the only minimum VF we can compute this early is 1.
* tree-vectorizer.h (vect_analyze_data_refs): Remove min_vf
output.
* tree-vect-data-refs.cc (vect_analyze_data_refs): Likewise.
* tree-vect-loop.cc (vect_analyze_loop_2): Remove early
out based on bogus min_vf.
* tree-vect-slp.cc (vect_slp_analyze_bb_1): Adjust.
Mikael Morin [Mon, 21 Jul 2025 08:31:35 +0000 (10:31 +0200)]
fortran: Factor array descriptor references
Save subexpressions of array descriptor references to variables, so that
all the expressions using the descriptor as base object benefit from a
simplified reference using the variables.
This limits the size of the expressions generated in the original tree
dump, easing analysis of the code involving those expressions.
This is especially helpful with chains of array references where each
array in the chain uses a descriptor.
After optimizations, the effect of the change shouldn't be visible in
the vast majority of cases. In rare cases it seems to permit a couple
more jump threadings.
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_conv_ss_descriptor): Move the descriptor
expression initialisation...
(set_factored_descriptor_value): ... to this new function.
Before initialisation, walk the reference expression passed as
argument and save some of its subexpressions to a variable.
(substitute_t): New struct.
(maybe_substitute_expr): New function.
(substitute_subexpr_in_expr): New function.
RISC-V: Add testcase for unsigned scalar SAT_ADD form 8 and form 9
This patch adds testcase for form8 and form9, as shown below:
T __attribute__((noinline)) \
sat_u_add_##T##_fmt_8(T x, T y) \
{ \
return x <= (T)(x + y) ? (x + y) : -1; \
}
T __attribute__((noinline)) \
sat_u_add_##T##_fmt_9(T x, T y) \
{ \
return x > (T)(x + y) ? -1 : (x + y); \
}
Passed the rv64gc regression test.
Signed-off-by: Ciyan Pan <panciyan@eswincomputing.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Unsigned testcase form8 form9.
* gcc.target/riscv/sat/sat_u_add-8-u16.c: New test.
* gcc.target/riscv/sat/sat_u_add-8-u32.c: New test.
* gcc.target/riscv/sat/sat_u_add-8-u64.c: New test.
* gcc.target/riscv/sat/sat_u_add-8-u8.c: New test.
* gcc.target/riscv/sat/sat_u_add-9-u16.c: New test.
* gcc.target/riscv/sat/sat_u_add-9-u32.c: New test.
* gcc.target/riscv/sat/sat_u_add-9-u64.c: New test.
* gcc.target/riscv/sat/sat_u_add-9-u8.c: New test.
* gcc.target/riscv/sat/sat_u_add-run-8-u16.c: New test.
* gcc.target/riscv/sat/sat_u_add-run-8-u32.c: New test.
* gcc.target/riscv/sat/sat_u_add-run-8-u64.c: New test.
* gcc.target/riscv/sat/sat_u_add-run-8-u8.c: New test.
* gcc.target/riscv/sat/sat_u_add-run-9-u16.c: New test.
* gcc.target/riscv/sat/sat_u_add-run-9-u32.c: New test.
* gcc.target/riscv/sat/sat_u_add-run-9-u64.c: New test.
* gcc.target/riscv/sat/sat_u_add-run-9-u8.c: New test.
Thomas Schwinge [Fri, 18 Jul 2025 10:56:13 +0000 (12:56 +0200)]
Adjust 'libgomp.c++/target-cdtor-{1,2}.C' for 'targetm.cxx.use_aeabi_atexit' [PR119853, PR119854]
Fix-up for commit aafe942227baf8c2bcd4cac2cb150e49a4b895a9
"GCN, nvptx offloading: Host/device compatibility: Itanium C++ ABI, DSO Object Destruction API [PR119853, PR119854]":
we need to adjust for 'targetm.cxx.use_aeabi_atexit':
gcc/config/arm/arm.cc:/* The EABI says __aeabi_atexit should be used to register static
gcc/config/arm/arm.cc- destructors. */
gcc/config/arm/arm.cc-
gcc/config/arm/arm.cc-static bool
gcc/config/arm/arm.cc:arm_cxx_use_aeabi_atexit (void)
gcc/config/arm/arm.cc-{
gcc/config/arm/arm.cc- return TARGET_AAPCS_BASED;
gcc/config/arm/arm.cc-}
..., which 'gcc/cp/decl.cc:get_atexit_node' then acts on: call '__aeabi_atexit'
instead of '__cxa_atexit', and swap two arguments.
Jakub Jelinek [Sun, 20 Jul 2025 06:12:57 +0000 (08:12 +0200)]
libstdc++: Export std::dextents from std.cc.in [PR121174]
r16-442 implemented both std::extents and std::dextents (and perhaps other
stuff), but exported only std::extents.
I went through https://eel.is/c++draft/mdspan.syn and I think std::dextents
is the only one implemented but not exported.
The following patch exports it, and additionally appends some further
entities to the FIXME list, those all seems to be unimplemented yet.
2025-07-20 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/121174
* src/c++23/std.cc.in (std::dextents): Export. Add to FIXME comments
other not yet implemented nor exported <mdspan> entities.
Andrew Pinski [Sun, 20 Jul 2025 02:11:09 +0000 (19:11 -0700)]
testsuite: Fix afdo-crossmodule-1b.c [PR120859]
The problem here is that the testcase is part of another
testcase but dg-final does not work across source files
so it needs its own dg-* headers to that match up with
afdo-crossmodule-1.c.
Pushed as preapproved in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120859#c4 .
PR testsuite/120859
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/afdo-crossmodule-1b.c: Add some dg-*
commands like what is in afdo-crossmodule-1.c
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Pan Li [Sat, 19 Jul 2025 02:49:15 +0000 (10:49 +0800)]
RISC-V: Refine the test case for vector avg_floor and avg_ceil [NFC]
The previous test case doesn't leverage the right test helper macro,
it should be DEF_AVG_0_WRAP instead of DEF_AVG_0. We prefer the
test function name is test_avg_floor_int64_t_int32_t_0 instead
of test_avg_floor_WT_NT_0 for DEF_AVG_0(WT, NT).
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
pru: Use signed HOST_WIDE_INT for handling ctable addresses
The ctable base address for SBCO/LBCO load/store patterns was
incorrectly stored as unsigned integer. That prevented matching
addresses with bit 31 set, because const_int RTL expression is expected
to be sign-extended.
Fix by using sign-extended 32-bit values for ctable base addresses.
PR target/121124
gcc/ChangeLog:
* config/pru/pru-pragma.cc (pru_pragma_ctable_entry): Handle the
ctable base address as signed 32-bit value, and sign-extend to
HOST_WIDE_INT.
* config/pru/pru-protos.h (struct pru_ctable_entry): Store the
ctable base address as signed.
(pru_get_ctable_exact_base_index): Pass base address as signed.
(pru_get_ctable_base_index): Ditto.
(pru_get_ctable_base_offset): Ditto.
* config/pru/pru.cc (pru_get_ctable_exact_base_index): Ditto.
(pru_get_ctable_base_index): Ditto.
(pru_get_ctable_base_offset): Ditto.
(pru_print_operand_address): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/pru/pragma-ctable_entry-2.c: New test.
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a float_extend'ed vec_duplicate into a (possibly negated) minus-mult
RTL instruction.
Before this patch, we have six instructions, e.g.:
vsetivli zero,4,e32,m1,ta,ma
fcvt.s.h fa5,fa5
vfmv.v.f v4,fa5
vfwcvt.f.f.v v1,v3
vsetvli zero,zero,e32,m1,ta,ma
vfnmadd.vv v1,v4,v2
After, we get only one:
vfwnmacc.vf v1,fa5,v2
PR target/119100
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vfwnmacc_vf_<mode>): New pattern.
(*vfwnmsac_vf_<mode>): New pattern.
* config/riscv/riscv.cc (get_vector_binary_rtx_cost): Add support for a
vec_duplicate in a neg.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwnmacc and
vfwnmsac.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmacc-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmacc-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmsac-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmsac-run-1-f32.c: New test.
[PATCH] RISC-V: prevent NULL_RTX dereference in riscv_macro_fusion_pair_p ()
> A number of folks have had their fingers in this code and it's going to take
> a few submissions to do everything we want to do.
>
> This patch is primarily concerned with avoiding signaling that fusion can
> occur in cases where it obviously should not be signaling fusion.
Hi Jeff,
With this change, we're liable to ICE whenever prev_set or curr_set are
NULL_RTX. For a fix, how about something like the below?
Thanks,
Artemiy
Introduced in r16-1984-g83d19b5d842dad, initializers for
{prev,curr}_dest_regno can cause an ICE if the respective insn isn't a
single set. Rectify this by inserting a NULL_RTX check before using
{prev,curr}_set.
Regtested on riscv32.
gcc/
* config/riscv/riscv.cc (riscv_macro_fusion_pair_p): Protect
from a NULL PREV_SET or CURR_SET.
Jonathan Wakely [Fri, 18 Jul 2025 23:08:26 +0000 (00:08 +0100)]
libstdc++: Only define __any_input_iterator for C++20
Currently this new concept will get defined for -std=c++17 -fconcepts
but as it uses std::input_iterator, which is new in C++20, that won't
work. Guard it with __cpp_lib_concepts as well as __cpp_concepts.
libstdc++-v3/ChangeLog:
* include/bits/stl_iterator_base_types.h (__any_input_iterator):
Only define when __cpp_lib_concepts is defined.
Andrew Pinski [Fri, 18 Jul 2025 17:07:34 +0000 (10:07 -0700)]
testsuite/vec: Fix vect-reduc-cond-[12].c for non vect_condition targets [PR121153]
I missed this when I added the two testcase vect-reduc-cond-[12].c. These testcases
require support of vectorization of `a ? b : c` which some targets (e.g. sparc) does
not support.
Jonathan Wakely [Fri, 18 Jul 2025 16:44:45 +0000 (17:44 +0100)]
libstdc++: Remove Paolo from list of people to contact about contributing
Paolo has not been active for some time.
libstdc++-v3/ChangeLog:
* doc/xml/manual/appendix_contributing.xml: Remove Paolo from
list of maintainers to contact about contributing.
* doc/html/manual/appendix_contributing.html: Regenerate.
Pan Li [Wed, 16 Jul 2025 13:40:14 +0000 (21:40 +0800)]
RISC-V: Support RVVDImode for avg3_ceil auto vect
Like the avg3_floor pattern, the avg3_ceil has the
similar issue that lack of the RVV DImode support.
Thus, this patch would like to support the DImode by
the standard name, with the iterator V_VLSI_D.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/autovec.md (avg<mode>3_ceil): Add new pattern
of avg3_ceil for RVV DImode
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/avg_data.h: Adjust the test data.
* gcc.target/riscv/rvv/autovec/avg_ceil-1-i64-from-i128.c: New test.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i64-from-i128.c: New test.
Tomasz Kamiński [Fri, 18 Jul 2025 09:30:22 +0000 (11:30 +0200)]
libstdc++: Fixed localized empty-spec formatting for months/weekdays [PR121154]
Previously for localized output, if _M_debug option was set, the _M_check_ok
completed succesfully and _M_locale_fmt was called for months/weekdays that
are !ok().
This patch lifts debug checks from each conversion function into _M_check_ok,
that in case of !ok() values return a string_view containing the kind of
calendar data, to be included after "is not a valid" string. The localized
output (_M_locale_fmt) is not used if string is non-empty. Emitting of this
message is now handled in _M_format_to, further reducing each specifier
function.
To handle weekday (%a,%A) and month (%b,%B), _M_check_ok now accepts a
mutable reference to conversion specifier, and updates it to corresponding
numeric value (%w, %m). Extra care needs to be taken to handle a month(0)
that needs to be printed as single digit in debug format.
Finally, the _M_time_point is replaced with _M_needs_ok_check member, that
indicates if input contains any user-suplied values that are checked for
being ok() and these values are referenced in chrono-specs.
PR libstdc++/121154
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (_ChronoSpec::_M_time_point): Remove.
(_ChronoSpec::_M_needs_ok_check): Define
(__formatter_chrono::_M_parse): Set _M_needs_ok_check.
(__formatter_chrono::_M_check_ok): Check values also for debug mode,
and return __string_view.
(__formatter_chrono::_M_format_to): Handle results of _M_check_ok.
(__formatter_chrono::_M_wi, __formatter_chrono::_M_a_A)
(__formatter_chrono::_M_b_B, __formatter_chrono::_M_C_y_Y)
(__formatter_chrono::_M_d_e, __formatter_chrono::_M_F):
Removed handling of _M_debug.
(__formatter_chrono::__M_m): Print zero unpadded in _M_debug mode.
(__formatter_duration::_S_spec_for): Remove _M_time_point refernce.
(__formatter_duration::_M_parse): Override _M_needs_ok_check.
* testsuite/std/time/month/io.cc: Test for localized !ok() values.
* testsuite/std/time/weekday/io.cc: Test for localized !ok() values.
Martin Jambor [Fri, 18 Jul 2025 10:42:11 +0000 (12:42 +0200)]
tree-sra: Fix grp_covered flag computation when totally scalarizing (PR117423)
Testcase of PR 117423 shows a flaw in the fancy way we do "total
scalarization" in SRA now. We use the types encountered in the
function body and not in type declaration (allowing us to totally
scalarize when only one union field is ever used, since we effectively
"skip" the union then) and can accommodate pre-existing accesses that
happen to fall into padding.
In this case, we skipped the union (bypassing the
totally_scalarizable_type_p check) and the access falling into the
"padding" is an aggregate and so not a candidate for SRA but actually
containing data. Arguably total scalarization should just bail out
when it encounters this situation (but I decided not to depend on this
mainly because we'd need to detect all cases when we eventually cannot
scalarize, such as when a scalar access has children accesses) but the
actual bug is that the detection if all data in an aggregate is indeed
covered by replacements just assumes that is always the case if total
scalarization triggers which however may not be the case in cases like
this - and perhaps more.
This patch fixes the bug by just assuming that all padding is taken
care of when total scalarization triggered, not that every access was
actually scalarized.
gcc/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* tree-sra.cc (analyze_access_subtree): Fix computation of grp_covered
flag.
gcc/testsuite/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* gcc.dg/tree-ssa/pr117423.c: New test.
Jonathan Wakely [Fri, 18 Jul 2025 08:55:13 +0000 (09:55 +0100)]
libstdc++: Fix hash<__int128> test for x32 [PR121150]
I incorrectly assumed that all targets that support __int128 use the
LP64 ABI, so size_t is a 64-bit type. But x32 uses ILP32 and still
supports __int128 (because it's an ILP32 target on 64-bit hardware).
Add casts to the tests so that we get the correct expected values using
size_t type.
libstdc++-v3/ChangeLog:
PR libstdc++/121150
* testsuite/20_util/hash/int128.cc: Cast expected values to
size_t.
Jonathan Wakely [Mon, 14 Jul 2025 19:15:12 +0000 (20:15 +0100)]
libstdc++: Implement reverse iteration for _Utf_view
This implements the missing functions in _Utf_iterator to support
reverse iteration. All existing tests pass when the view is reversed, so
that the same code units are seen when iterating forwards or backwards.
libstdc++-v3/ChangeLog:
* include/bits/unicode.h (_Utf_iterator::operator--): Reorder
conditions and update position after reading a code unit.
(_Utf_iterator::_M_read_reverse): Define.
(_Utf_iterator::_M_read_utf8): Return extracted code point.
(_Utf_iterator::_M_read_reverse_utf8): Define.
(_Utf_iterator::_M_read_reverse_utf16): Define.
(_Utf_iterator::_M_read_reverse_utf32): Define.
* testsuite/ext/unicode/view.cc: Add checks for reversed views
and reverse iteration.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Wed, 16 Jul 2025 23:11:49 +0000 (00:11 +0100)]
libstdc++: Optimize _Utf_iterator for size
This reorders the data members of _Utf_iterator to avoid padding bytes
between members due to alignment requirements. For x86_64 the previous
layout had padding after _M_buf and after _M_to_increment for the common
case where the iterators and sentinel types are pointers, so the size
shrinks from 40 bytes to 32 bytes. (For i686 there's no change, it's
still 20 bytes).
We could compress the three uint8_t members into one byte by using
bit-fields:
But there doesn't seem to be any point, because it will just be slower
to access them and there will be tail padding so the size isn't any
smaller. We could also reduce _M_buf_last and _M_to_increment to 2 bits
because the 0 value is only used for a default constructed iterator, and
we don't actually care about the values in that case. Again, this
doesn't seem worth doing.
libstdc++-v3/ChangeLog:
* include/bits/unicode.h (_Utf_iterator): Reorder data members
to be more compact.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Richard Biener [Fri, 18 Jul 2025 07:02:09 +0000 (09:02 +0200)]
tree-optimization/120924 - up --param uninit-max-chain-len
The PR shows that the uninit analysis limits are set too low in
cases we lower switches to ifs as happens on s390x for a linux
kernel TU. This causes false positive uninit diagnostics as we
abort the attempt to prove that a value is initialized on all
paths. The new testcase only would require upping to 9.
PR tree-optimization/120924
* params.opt (uninit-max-chain-len): Up from 8 to 12.
Steve Baird [Thu, 26 Jun 2025 19:23:53 +0000 (12:23 -0700)]
ada: Spurious actual/formal matching check failure for formal derived type.
In some cases involving a generic with two formal parameters, a formal package
and a formal derived type that is derived from an interface type declared in
the formal package, a legal instantiation of that generic is rejected with a
message incorrectly stating that the second actual parameter does not implement
the required interface.
gcc/ada/ChangeLog:
* sem_ch12.adb (Validate_Derived_Type_Instance): Cope with the case
where the ancestor type for a formal derived type is declared in
an earlier formal package but Get_Instance_Of does not return the
corresponding type from the corresponding actual package.
Marc Poulhiès [Fri, 20 Jun 2025 14:10:25 +0000 (16:10 +0200)]
ada: Do not inline function returning on the secondary stack
When inlining function calls that return on the secondary stack used as
function actual or in a return statement, the compiler creates an
invalid GNAT Tree with a variable of an unconstrained type without an
initializer.
Also add an extra assertion to catch problematic cases directly in
Expand_Inlined_Call.
gcc/ada/ChangeLog:
* exp_ch6.adb (Convert): Do not call Expand_Inlined_Call for
unsupported cases.
* inline.adb (Expand_Inlined_Call): Add assert to catch unsupported
case.
Co-authored-by: Eric Botcazou <botcazou@adacore.com>
Gary Dismukes [Tue, 17 Jun 2025 21:55:58 +0000 (21:55 +0000)]
ada: Incorrect resolution of prefixed calls with overriding private subprogram
The compiler incorrectly treats an overriding private subprogram that
should not be visible outside a package (because it only overrides in
the private part) as a possible interpretation for a call using prefixed
notation outside of the package. This can result in an ambiguity if there
is another subprogram with the same name but a different profile declared
in the visible part of the package, or can result in resolving to the
private operation in cases where it shouldn't resolve. This happens due
to the compiler improperly concluding that the private overriding subprogram
overrides an inherited subprogram in the package visible part, even though
the only inherited subprogram is in the private part, as a result of
a misuse of the Overridden_Operation field, which, contrary to what
its name suggests, actually refers to operations of the parent type,
rather than to the operations derived from the parent's operations.
gcc/ada/ChangeLog:
* einfo.ads: Document new field Overridden_Inherited_Operation and
list it as a field for the entity kinds that it applies to.
* gen_il-fields.ads (type Opt_Field_Enum): Add new literal
Overridden_Inherited_Operation to the type.
* gen_il-gen-gen_entities.adb: Add Overridden_Inherited_Operation as
a field of entities of kinds E_Enumeration_Literal and Subprogram_Kind.
* sem_ch4.adb (Is_Callable_Private_Overriding): Change name (was
Is_Private_Overriding). Replace Is_Hidden test on Overridden_Operation
with test of Is_Hidden on the new field Overridden_Inherited_Operation.
* sem_ch6.adb (New_Overloaded_Entity): Set the new field
Overridden_Inherited_Operation on an operation derived from
an interface to refer to the inherited operation of a private
extension that's overridden by the derived operation. Also set
that field in the more common cases of an explicit subprogram
that overrides, to refer to the inherited subprogram that is
overridden. (Contrary to its name, the Overridden_Operation
field of the overriding subprogram, which is also set in these
places, refers to the *parent* subprogram from which the inherited
subprogram is derived.) Also, remove a redundant Present (Alias (S))
test in an if_statement and the dead "else" part of that statement.
Piotr Trojanek [Thu, 26 Jun 2025 12:57:14 +0000 (14:57 +0200)]
ada: Elaboration entity must not be ghost in ghost generic instances
For non-instance units GNAT builds elaboration entities before the ghost mode
is inherited from those units. However, for generic instances GNAT was building
elaboration entities with ghost mode inherited from those instances, which
effectively caused elaboration entities to become ghost objects.
This patch add ghost management to routine that builds elaboration entities,
which seems simpler and more robust than adjusting the ghost mode in all
callers of this routine.
gcc/ada/ChangeLog:
* sem_util.adb (Build_Elaboration_Entity): Set ghost mode to none
before creating the elaboration entity; restore the ghost mode
afterwards.
Javier Miranda [Tue, 17 Jun 2025 13:09:11 +0000 (13:09 +0000)]
ada: Array aggregates of mutably tagged objects (part 2)
gcc/ada/ChangeLog:
* exp_aggr.adb (Gen_Assign): Code cleanup.
(Initialize_Component): Do not adjust the tag when the type of
the aggregate components is a mutably tagged type.
Jonathan Wakely [Thu, 5 Jun 2025 11:05:19 +0000 (12:05 +0100)]
libstdc++: Add std::inplace_vector for C++26 (P0843R14) [PR119137]
Implement std::inplace_vector as specified in P0843R14, without follow
up papers, in particular P3074R7 (trivial unions). In consequence
inplace_vector<T, N> can be used inside constant evaluations only
if T is trivial or N is equal to zero.
We provide a separate specialization for inplace_vector<T, 0> to meet
the requirements of N5008 [inplace.vector.overview] p5. In particular
objects of such types needs to be empty.
To allow constexpr variable of inplace_vector v, where v.size() < v.capacity(),
we need to guaranteed that all elements of the storage array are initialized,
even ones in range [v.data() + v.size(), v.data() + v.capacity()). This is
perfoirmed by _M_init function, that is called by each constructor. By storing
the array in anonymous union, we can perform this initialization in constant
evaluation, avoiding the impact on runtime path.
The size() function conveys the information that _M_size <= _Nm to compiler,
by calling __builtin_unreachable(). In particular this allows us to eliminate
FP warnings by using _Nm - size() instead of _Nm - _M_size, when computing
available elements.
The included test cover almost all code paths at runtime, however some
compile time evaluation test are not yet implemented:
* operations on range, they depend on making testsuite_iterators constexpr
* negative test for invoking operations with preconditions at compile time,
especially for zero size specialization.
PR libstdc++/119137
libstdc++-v3/ChangeLog:
* doc/doxygen/user.cfg.in (INPUT): Add new header.
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/stl_iterator_base_types.h (__any_input_iterator):
Define.
* include/bits/version.def (inplace_vector): Define.
* include/bits/version.h: Regenerate.
* include/precompiled/stdc++.h: Include new header.
* src/c++23/std.cc.in: Export contents if new header.
* include/std/inplace_vector: New file.
* testsuite/23_containers/inplace_vector/access/capacity.cc: New file.
* testsuite/23_containers/inplace_vector/access/elem.cc: New file.
* testsuite/23_containers/inplace_vector/access/elem_neg.cc: New file.
* testsuite/23_containers/inplace_vector/cons/1.cc: New file.
* testsuite/23_containers/inplace_vector/cons/from_range.cc: New file.
* testsuite/23_containers/inplace_vector/cons/throws.cc: New file.
* testsuite/23_containers/inplace_vector/copy.cc: New file.
* testsuite/23_containers/inplace_vector/erasure.cc: New file.
* testsuite/23_containers/inplace_vector/modifiers/assign.cc: New file.
* testsuite/23_containers/inplace_vector/modifiers/erase.cc: New file.
* testsuite/23_containers/inplace_vector/modifiers/multi_insert.cc:
New file.
* testsuite/23_containers/inplace_vector/modifiers/single_insert.cc:
New file.
* testsuite/23_containers/inplace_vector/move.cc: New file.
* testsuite/23_containers/inplace_vector/relops.cc: New file.
* testsuite/23_containers/inplace_vector/version.cc: New file.
* testsuite/util/testsuite_iterators.h (input_iterator_wrapper::base):
Define.
Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Co-authored-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Richard Biener [Thu, 17 Jul 2025 06:51:11 +0000 (08:51 +0200)]
tree-optimization/121048 - move check for only having vector(1)
The following moves rejecting loop vectorization with vector(1)
typed vectors from the initial vector type determining to after
SLP discovery when we can check whether there's any instance
with other than vector(1) vectors. For RVV at least vector(1)
instances serve as a limited way to support partial loop vectorization.
The following restores this.
PR tree-optimization/121048
* tree-vect-loop.cc (vect_determine_vectype_for_stmt_1):
Remove rejecting vector(1) vector types.
(vect_set_stmts_vectype): Likewise.
* tree-vect-slp.cc (vect_make_slp_decision): Only
count instances with non-vector(1) root towards whether
we have any interesting instances to vectorize.
Jakub Jelinek [Fri, 18 Jul 2025 07:20:30 +0000 (09:20 +0200)]
gimple-fold: Fix up big endian _BitInt adjustment [PR121131]
The following testcase ICEs because SCALAR_INT_TYPE_MODE of course
doesn't work for large BITINT_TYPE types which have BLKmode.
native_encode* as well as e.g. r14-8276 use in cases like these
GET_MODE_SIZE (SCALAR_INT_TYPE_MODE ()) and TREE_INT_CST_LOW (TYPE_SIZE_UNIT
()) for the BLKmode ones.
In this case, it wants bits rather than bytes, so I've used
GET_MODE_BITSIZE like before and TYPE_SIZE otherwise.
Furthermore, the patch only computes encoding_size for big endian
targets, for little endian we don't really adjust anything, so there
is no point computing it.
2025-07-18 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/121131
* gimple-fold.cc (fold_nonarray_ctor_reference): Use
TREE_INT_CST_LOW (TYPE_SIZE ()) instead of
GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE ()) for BLKmode BITINT_TYPEs.
Don't compute encoding_size at all for little endian targets.
Andrew Pinski [Wed, 16 Jul 2025 16:31:35 +0000 (09:31 -0700)]
gcse: Skip hardreg pre when the hardreg is never live [PR121095]
r15-6789-ge7f98d9603808b added a new RTL pass for hardreg PRE for the hard register
of FPM_REGNUM, this pass could get expensive if you have a large number of basic blocks
and the hard register was never live so it does nothing in the end.
In the aarch64 case, FPM_REGNUM is only used for FP8 related code so it has a high probability
of not being used. So skipping the pass for that register can improve both compile time and memory
usage.
Build and tested for aarch64-linux-gnu.
PR middle-end/121095
gcc/ChangeLog:
* gcse.cc (execute_hardreg_pre): Skip if the hardreg which is never live.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
libstdc++: Fix forwarding of custom IndexType in mdspan [PR121061]
The second bug report in PR121061 is that the conversion of custom
OtherIndexType to IndexType is incorrectly not done via r-value
references.
This commit fixes the forwarding issue, adds a custom IndexType called
RValueInt, which only allows conversion to int via r-value reference.
PR libstdc++/121061
libstdc++-v3/ChangeLog:
* include/std/mdspan (extents::extents): Perform conversion to
index_type of an r-value reference.
(layout_left::mapping::operator()): Ditto.
(layout_right::mapping::operator()): Ditto.
(layout_stride::mapping::operator()): Ditto.
* testsuite/23_containers/mdspan/extents/custom_integer.cc: Add
tests for RValueInt and MutatingInt.
* testsuite/23_containers/mdspan/int_like.h (RValueInt): Add.
* testsuite/23_containers/mdspan/layouts/mapping.cc: Test with
RValueInt.
* testsuite/23_containers/mdspan/mdspan.cc: Ditto.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
libstdc++: Fix constraint for custom integer types in mdspan [PR121061]
PR121061 consists of two bugs for mdspan related code. This commit fixes
the first one. Namely, when passing custom IndexType as an array or
span, the conversion to int must be const. Prior to this commit the
constraint incorrectly also allowed non-const conversion. This commit
updates all related constraints to check
PR121061 shows that the test coverage for custom integer types is
insufficient. Custom IndexTypes are passed to mdspan related objects in
one of two ways:
* as a template parameter pack,
* or as an array/span.
These two cases have different requirements on the (constness of) custom
IndexTypes. Therefore, the tests are restructured as follows:
* allow testing with different custom integers,
* separate code that tests the two cases described above,
* use int_like.h for all tests with custom integers.
The affected tests are for:
* creating extents, layout_stride::mapping and mdspan from
custom integers,
* mapping::operator() and mdspan::operator[].
PR libstdc++/121061
libstdc++-v3/ChangeLog:
* testsuite/23_containers/mdspan/extents/custom_integer.cc:
Enable checking with different custom integers. Improve
checking non-existence of overloads for incompatible custom
integers.
* testsuite/23_containers/mdspan/layouts/mapping.cc: ditto. Also
improve reuse of int_like.h.
* testsuite/23_containers/mdspan/layouts/stride.cc: ditto.
* testsuite/23_containers/mdspan/mdspan.cc: ditto.
* testsuite/23_containers/mdspan/extents/int_like.h: Rename (old
name).
* testsuite/23_containers/mdspan/int_like.h: Rename (new name).
(ThrowingInt): Add.
(NotIntLike): Add.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
Filip Kastl [Thu, 17 Jul 2025 12:52:59 +0000 (14:52 +0200)]
tree-ssa-structalias / pta: Fix *more* GNU coding style deviations
This continues my previous commit, where I fixed some deviations from
GNU coding style in pta files. This should fix all the remaining issues
that contrib/check_GNU_style.py can detect (excluding false positives).
Commiting as obvious.
gcc/ChangeLog:
* tree-ssa-structalias.cc (lookup_vi_for_tree): Fix GNU style.
(process_constraint): Fix GNU style.
(get_constraint_for_component_ref): Fix GNU style.
(get_constraint_for_1): Fix GNU style.
(get_function_part_constraint): Fix GNU style.
(handle_lhs_call): Fix GNU style.
(find_func_aliases_for_builtin_call): Fix GNU style.
(find_func_aliases): Fix GNU style.
(find_func_clobbers): Fix GNU style.
(struct shared_bitmap_hasher): Fix GNU style.
(shared_bitmap_hasher::hash): Fix GNU style.
(pt_solution_includes_global): Fix GNU style.
(init_base_vars): Fix GNU style.
(visit_loadstore): Fix GNU style.
(compute_dependence_clique): Fix GNU style.
(struct pt_solution): Fix GNU style.
(ipa_pta_execute): Fix GNU style.
Thomas Schwinge [Wed, 16 Jul 2025 20:13:46 +0000 (22:13 +0200)]
GCN, nvptx offloading: Restrain 'WARNING: program timed out.' while in 'dynamic_cast' only for effective-target 'offload_device' [PR119692]
In PR119692 "C++ 'typeinfo', 'vtable' vs. OpenACC, OpenMP 'target' offloading":
> --- Comment #8 from Rainer Orth <ro at gcc dot gnu.org> ---
> The last commit made things worse on sparc-sun-solaris2.11: since that one
> (dg-timeout 10) I regularly get
>
> WARNING: libgomp.c++/target-exceptions-bad_cast-1.C (test for excess errors)
> program timed out.
> FAIL: libgomp.c++/target-exceptions-bad_cast-1.C (test for excess errors)
> UNRESOLVED: libgomp.c++/target-exceptions-bad_cast-1.C compilation failed to produce executable
> UNRESOLVED: libgomp.c++/target-exceptions-bad_cast-1.C scan-tree-dump-times optimized "gimple_call <__cxa_bad_cast, " 1
>
> Before that, the test had no issue. Compiling the test on an unloaded system
> usually takes less than 1 sec, but when fully loaded, times can go up.
To keep things simple, let's restrict this temporary (yeah...) workaround to
apply only for effective-target 'offload_device', just like the
'dg-xfail-run-if' itself.
Filip Kastl [Thu, 17 Jul 2025 12:30:11 +0000 (14:30 +0200)]
tree-ssa-structalias / pta: Fix some GNU coding style deviations
Fix some deviations from GNU coding style in pta files as reported by
contrib/check_GNU_style.py. Most of these are "dot, space, space, end
of comment".
Commiting as obvious.
gcc/ChangeLog:
* pta-andersen.cc (struct constraint_graph): Fix GNU style.
(constraint_equal): Fix GNU style.
(set_union_with_increment): Fix GNU style.
(insert_into_complex): Fix GNU style.
(merge_node_constraints): Fix GNU style.
(unify_nodes): Fix GNU style.
(do_ds_constraint): Fix GNU style.
(scc_info::scc_info): Fix GNU style.
(find_indirect_cycles): Fix GNU style.
(equiv_class_lookup_or_add): Fix GNU style.
(label_visit): Fix GNU style.
(dump_pred_graph): Fix GNU style.
(perform_var_substitution): Fix GNU style.
(eliminate_indirect_cycles): Fix GNU style.
(solve_graph): Fix GNU style.
(solve_constraints): Fix GNU style.
* tree-ssa-structalias.cc (first_vi_for_offset): Fix GNU style.
(debug_constraint): Fix GNU style.
* tree-ssa-structalias.h (struct constraint_expr): Fix GNU
style.
(struct variable_info): Fix GNU style.
H.J. Lu [Sun, 2 Mar 2025 01:10:57 +0000 (09:10 +0800)]
x86: Don't change mode for XOR in ix86_expand_ternlog
There is no need to change mode for XOR in ix86_expand_ternlog now.
Whatever reasons for it in the first place no longer exist. Tested
on x86-64 with -m32. There are no regressions.
* config/i386/i386-expand.cc (ix86_expand_ternlog): Don't change
mode for XOR.
Filip Kastl [Thu, 17 Jul 2025 11:29:50 +0000 (13:29 +0200)]
tree-ssa-structalias: Put solver into its own file
This patch cuts out the points-to solver from tree-ssa-structalias.cc
and places it into a new file pta-andersen.cc. It is the first part of
my effort to split tree-ssa-structalias.cc into smaller parts.
I had to give external linkage to some static functions and variables.
I put those in the new header files tree-ssa-structalias.h and
pta-andersen.h. Those header files are meant as an internal interface
between parts of the points-to analyzer.
Some functions and variables already had external linkage and were declared in
tree-ssa-alias.h. I considered moving the declarations to
tree-ssa-structalias.h but decided to leave them as is. I see those functions
and variables as an external interface -- facing outwards to the rest of the
compiler.
For the internal interface, I made a new namespace "pointer_analysis".
I didn't want to clutter the global namespace and possibly run into ODR
problems.
I wanted to encapsulate the constraint graph within the solver. To achieve
that, I had to make some changes beyond just moving things around. They were
only very small changes, though:
- Add delete_graph() which gets called at the end of solve_constraints()
- Problem: The solver assigns representatives to variables (union-find). To
then get the solution for variable v, one has to look up the representative
of v. The information needed to look up the representative is part of the
graph.
- Solution: Let the solver output an array that maps variables to their
representatives and let this array outlive the graph (array var_rep).
- Constructing the array means doing find() for every variable. That should
amortize to O(size of the union-find structure). So this won't hurt the
asymptotic time complexity.
- We replace all calls to find(var) in tree-ssa-structalias.cc with
just an array lookup var_rep[var].
- predbitmap_obstack gets initialized in init_graph().
gcc/ChangeLog:
* Makefile.in: Add pta-andersen.o.
* tree-ssa-structalias.cc (create_variable_info_for): Just move
around.
(unify_nodes): Move to pta-andersen.cc.
(struct constraint): Move to tree-ssa-structalias.h.
(EXECUTE_IF_IN_NONNULL_BITMAP): Move to pta-andersen.cc.
(struct variable_info): Move to tree-ssa-structalias.h.
(struct constraint_stats): Move to tree-ssa-structalias.h.
(first_vi_for_offset): External linkage, move to namespace
pointer_analysis.
(first_or_preceding_vi_for_offset): External linkage, move to namespace
pointer_analysis.
(dump_constraint): External linkage, move to namespace
pointer_analysis.
(debug_constraint): External linkage, move to namespace
pointer_analysis.
(dump_constraints): External linkage, move to namespace
pointer_analysis.
(debug_constraints): External linkage, move to namespace
pointer_analysis.
(lookup_vi_for_tree): Move around inside tree-ssa-structalias.cc.
(type_can_have_subvars): Move around inside tree-ssa-structalias.cc.
(make_param_constraints): Move around inside tree-ssa-structalias.cc.
(dump_solution_for_var): External linkage, move to namespace
pointer_analysis. find (...) -> var_rep[...].
(get_varinfo): Move to tree-ssa-structalias.h.
(debug_solution_for_var): External linkage, move to namespace
pointer_analysis.
(vi_next): Move to tree-ssa-structalias.h.
(dump_sa_stats): External linkage, move to namespace pointer_analysis.
(new_var_info): Just move around.
(dump_sa_points_to_info): External linkage, move to namespace
pointer_analysis.
(debug_sa_points_to_info): External linkage, move to namespace
pointer_analysis.
(get_call_vi): Just move around.
(dump_varinfo): External linkage, move to namespace pointer_analysis.
(lookup_call_use_vi): Just move around.
(lookup_call_clobber_vi): Just move around.
(get_call_use_vi): Just move around.
(get_call_clobber_vi): Just move around.
(enum constraint_expr_type): Move to tree-ssa-structalias.h.
(struct constraint_expr): Move to tree-ssa-structalias.h.
(UNKNOWN_OFFSET): Move to tree-ssa-structalias.h.
(get_constraint_for_1): Just move around.
(get_constraint_for): Just move around.
(get_constraint_for_rhs): Just move around.
(do_deref): Just move around.
(constraint_pool): Just move around.
(struct constraint_graph): Move to pta-andersen.h.
(FIRST_REF_NODE): Move to pta-andersen.cc.
(LAST_REF_NODE): Move to pta-andersen.cc.
(find): Move to pta-andersen.cc.
(unite): Move to pta-andersen.cc.
(new_constraint): Just move around.
(debug_constraint_graph): External linkage, move to namespace
pointer_analysis.
(debug_varinfo): External linkage, move to namespace pointer_analysis.
(debug_varmap): External linkage, move to namespace pointer_analysis.
(dump_constraint_graph): External linkage, move to namespace
pointer_analysis.
(constraint_expr_equal): Move to pta-andersen.cc.
(constraint_expr_less): Move to pta-andersen.cc.
(constraint_less): Move to pta-andersen.cc.
(constraint_equal): Move to pta-andersen.cc.
(constraint_vec_find): Move to pta-andersen.cc.
(constraint_set_union): Move to pta-andersen.cc.
(solution_set_expand): Move to pta-andersen.cc.
(set_union_with_increment): Move to pta-andersen.cc.
(insert_into_complex): Move to pta-andersen.cc.
(merge_node_constraints): Move to pta-andersen.cc.
(clear_edges_for_node): Move to pta-andersen.cc.
(merge_graph_nodes): Move to pta-andersen.cc.
(add_implicit_graph_edge): Move to pta-andersen.cc.
(add_pred_graph_edge): Move to pta-andersen.cc.
(add_graph_edge): Move to pta-andersen.cc.
(init_graph): Move to pta-andersen.cc. Initialize
predbitmap_obstack here.
(build_pred_graph): Move to pta-andersen.cc.
(build_succ_graph): Move to pta-andersen.cc.
(class scc_info): Move to pta-andersen.cc.
(scc_visit): Move to pta-andersen.cc.
(solve_add_graph_edge): Move to pta-andersen.cc.
(do_sd_constraint): Move to pta-andersen.cc.
(do_ds_constraint): Move to pta-andersen.cc.
(do_complex_constraint): Move to pta-andersen.cc.
(scc_info::scc_info): Move to pta-andersen.cc.
(scc_info::~scc_info): Move to pta-andersen.cc.
(find_indirect_cycles): Move to pta-andersen.cc.
(topo_visit): Move to pta-andersen.cc.
(compute_topo_order): Move to pta-andersen.cc.
(struct equiv_class_hasher): Move to pta-andersen.cc.
(equiv_class_hasher::hash): Move to pta-andersen.cc.
(equiv_class_hasher::equal): Move to pta-andersen.cc.
(equiv_class_lookup_or_add): Move to pta-andersen.cc.
(condense_visit): Move to pta-andersen.cc.
(label_visit): Move to pta-andersen.cc.
(dump_pred_graph): External linkage, move to namespace
pointer_analysis.
(dump_varmap): External linkage, move to namespace pointer_analysis.
(perform_var_substitution): Move to pta-andersen.cc.
(free_var_substitution_info): Move to pta-andersen.cc.
(find_equivalent_node): Move to pta-andersen.cc.
(unite_pointer_equivalences): Move to pta-andersen.cc.
(move_complex_constraints): Move to pta-andersen.cc.
(rewrite_constraints): Move to pta-andersen.cc.
(eliminate_indirect_cycles): Move to pta-andersen.cc.
(solve_graph): Move to pta-andersen.cc.
(set_uids_in_ptset): find (...) -> var_rep[...].
(find_what_var_points_to): find (...) -> var_rep[...].
(init_alias_vars): Don't initialize predbitmap_obstack here.
(remove_preds_and_fake_succs): Move to pta-andersen.cc.
(solve_constraints): Move to pta-andersen.cc. Call
delete_graph() at the end.
(delete_points_to_sets): Don't delete graph here. Delete var_rep here.
(visit_loadstore): find (...) -> var_rep[...].
(compute_dependence_clique): find (...) -> var_rep[...].
(ipa_pta_execute): find (...) -> var_rep[...].
* pta-andersen.cc: New file.
* pta-andersen.h: New file.
* tree-ssa-structalias.h: New file.
aarch64: Adapt unwinder to linux's SME signal behaviour
SME uses a lazy save system to manage ZA. The idea is that,
if a function with ZA state wants to call a "normal" function,
it can leave its state in ZA and instead set up a lazy save buffer.
If, unexpectedly, that normal function contains a nested use of ZA,
that nested use of ZA must commit the lazy save first.
This lazy save system uses a special system register called TPIDR2_EL0.
See:
The ABI specifies that, on entry to an exception handler, the following
things must be true:
* PSTATE.SM must be 0 (the processor must be in non-streaming mode)
* PSTATE.ZA must be 0 (ZA must be off)
* TPIDR2_EL0 must be 0 (there must be no uncommitted lazy save)
This is normally done by making _Unwind_RaiseException & friends
commit any lazy save before they unwind. This also has the side
effect of ensuring that TPIDR2_EL0 is never left pointing to a
lazy save buffer that has been unwound.
However, things get more complicated with signals. If:
(a) a signal is raised while ZA is dormant (that is, while there is an
uncommitted lazy save);
(b) the signal handler throws an exception; and
(c) that exception is caught outside the signal handler
something must ensure that the lazy save from (a) is committed.
This would be simple if the signal handler was entered with ZA and
TPIDR2_EL0 intact. However, for various good reasons that are out
of scope here, this is not done. Instead, Linux now clears both
TPIDR2_EL0 and PSTATE.ZA before entering a signal handler, see:
Therefore, it is the unwinder that must simulate a commit of the lazy
save from (a). It can do this by reading the previous values of
TPIDR2_EL0 and ZA from the sigcontext.
The SME-related sigcontext structures were only added to linux's
asm/sigcontext.h relatively recently and we can't rely on GCC being
built against such recent kernel header files. The patch therefore uses
defines relevant macros if they are not defined and provide types that
comply with ABI layout of the corresponding linux types.
The patch includes some ugly casting in an attempt to support big-endian
ILP32, even though SME on big-endian ILP32 linux should never be a thing.
We can remove it if we also remove ILP32 support from GCC.
gcc/testsuite/
* lib/target-supports.exp (add_options_for_aarch64_sme)
(check_effective_target_aarch64_sme_hw): New procedures.
* g++.target/aarch64/sme/sme_throw_1.C: New test.
* g++.target/aarch64/sme/sme_throw_2.C: Likewise.
libgcc/
* config/aarch64/linux-unwind.h (aarch64_fallback_frame_state):
If a signal was raised while there was an uncommitted lazy save,
commit the save as part of the unwind process.
Currently for a signbit operation instructions tc{f,d,x}b + ipm + srl
are emitted. If the source operand is a MEM, then a load precedes the
sequence. A faster implementation is by issuing a load either from a
REG or MEM into a GPR followed by a shift.
In spirit of the signbit function of the C standard, the signbit optab
only guarantees that the resulting value is nonzero if the signbit is
set. The common code implementation computes a value where the signbit
is stored in the most significant bit, i.e., all other bits are just
masked out, whereas the current implementation of s390 results in a
value where the signbit is stored in the least significant bit.
Although, there is no guarantee where the signbit is stored, keep the
current behaviour and, therefore, implement the signbit optab manually.
Since z10, instruction lgdr can be effectively used for a 64-bit
FPR-to-GPR load. However, there exists no 32-bit pendant. Thus, for
target z10 make use of post-reload splitters which emit either a 64-bit
or a 32-bit load depending on whether the source operand is a REG or a
MEM and a corresponding 63 or 31-bit shift. We can do without
post-reload splitter in case of vector extensions since there we also
have a 32-bit VR-to-GPR load via instruction vlgvf.
gcc/ChangeLog:
* config/s390/s390.md (signbit_tdc): Rename expander.
(signbit<mode>2): New expander.
(signbit<mode>2_z10): New expander.
gcc/testsuite/ChangeLog:
* gcc.target/s390/isfinite-isinf-isnormal-signbit-2.c: Adapt
scan assembler directives.
* gcc.target/s390/isfinite-isinf-isnormal-signbit-3.c: Ditto.
* gcc.target/s390/signbit-1.c: New test.
* gcc.target/s390/signbit-2.c: New test.
* gcc.target/s390/signbit-3.c: New test.
* gcc.target/s390/signbit-4.c: New test.
* gcc.target/s390/signbit-5.c: New test.
* gcc.target/s390/signbit.h: New test.
Moving between GPRs and VRs in any mode with size less than or equal to
8 bytes becomes available with vector extensions. Without adapting
costs for those loads, we typically go over memory.
gcc/ChangeLog:
* config/s390/s390.cc (s390_register_move_cost): Add costing for
vlvg/vlgv.
Exploit the fact that instruction VLGV zeros excessive bits of a GPR.
gcc/ChangeLog:
* config/s390/vector.md (bhfgq): Add scalar modes.
(*movdi<mode>_zero_extend_A): New insn.
(*movsi<mode>_zero_extend_A): New insn.
(*movdi<mode>_zero_extend_B): New insn.
(*movsi<mode>_zero_extend_B): New insn.
gcc/testsuite/ChangeLog:
* gcc.target/s390/vector/vlgv-zero-extend-1.c: New test.
Jonathan Wakely [Tue, 15 Jul 2025 09:18:11 +0000 (10:18 +0100)]
libstdc++: Add comments to __unicode::_Utf_iterator
Add comments documenting what it does and how it does it.
Also reorder the if-else in operator++ so that we check whether to
iterate over code units in the local buffer before checking whether to
refill that buffer. That seems the more natural way to structure the
function.
libstdc++-v3/ChangeLog:
* include/bits/unicode.h (__unicode::_Utf_iterator): Add
comments.
(__unicode:_Utf_iterator::operator++()): Check whether to
iterate over the buffer first.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Xi Ruoyao [Mon, 14 Jul 2025 19:01:12 +0000 (03:01 +0800)]
LoongArch: Fix wrong code generated by TARGET_VECTORIZE_VEC_PERM_CONST [PR121064]
When TARGET_VECTORIZE_VEC_PERM_CONST is called, target may be the
same pseudo as op0 and/or op1. Loading the selector into target
would clobber the input, producing wrong code like
vld $vr0, $t0
vshuf.w $vr0, $vr0, $vr1
So don't load the selector into d->target, use a new pseudo to hold the
selector instead. The reload pass will load the pseudo for selector and
the pseudo for target into the same hard register (following our
constraint '0' on the shuf instructions) anyway.
gcc/ChangeLog:
PR target/121064
* config/loongarch/lsx.md (lsx_vshuf_<lsxfmt_f>): Add '@' to
generate a mode-aware helper. Use <VIMODE> as the mode of the
operand 1 (selector).
* config/loongarch/lasx.md (lasx_xvshuf_<lasxfmt_f>): Likewise.
* config/loongarch/loongarch.cc
(loongarch_try_expand_lsx_vshuf_const): Create a new pseudo for
the selector. Use the mode-aware helper to simplify the code.
(loongarch_expand_vec_perm_const): Likewise.
gcc/testsuite/ChangeLog:
PR target/121064
* gcc.target/loongarch/pr121064.c: New test.
Richard Biener [Thu, 10 Jul 2025 11:04:00 +0000 (13:04 +0200)]
Reject single lane vector types for SLP build
The following makes us never consider vector(1) T types for
vectorization and ensures this during SLP build. This is a
long-standing issue for BB vectorization and when we remove
early loop vector type setting we lose the single place we have
that rejects this for loops.
Once we implement partial loop vectorization we should revisit
this, but then use the original scalar types for the unvectorized
parts.
Richard Biener [Wed, 16 Jul 2025 18:19:44 +0000 (20:19 +0200)]
tree-optimization/121035 - handle stray VN values without expression
When VN iterates we can end up with unreachable inserted expressions
in the expression tables which in turn will not be added to their
value by PREs compute_avail. This will later ICE when we pick
them up and want to generate them. Deal with this by giving up.
PR tree-optimization/121035
* tree-ssa-pre.cc (find_or_generate_expression): Handle
values without expression.
> So looking into this further, MACHMODE_H used part of LIBGCC_DEPS
> because of TM_H and r0-78222-gfa9585134f6f58 moved away from including
> tm.h from libgcc. It was copied over unused.
It is indeed used then.
(For background context, my overall goal here is hoping libgcc can depend on
fewer/no stuff that is generated by `gcc/Makefile`. This is me trying to
pluck some low-hanging fruit -- this is the only direct mention of
`insn-modes.h` in libgcc.)
Co-Developed-by: H.J. Lu <hjl.tools@gmail.com>
gcc/
PR target/121062
* config/i386/i386.cc (ix86_convert_const_vector_to_integer):
Handle E_V1SImode and E_V1DImode.
* config/i386/mmx.md (V_16_32_64): Add V1SI, V2BF and V1DI.
(mmxinsnmode): Add V1DI and V1SI.
Add V_16_32_64 splitter for constant vector loads from constant
vector pool.
(V_16_32_64:*mov<mode>_imm): Moved after V_16_32_64 splitter.
Replace lowpart_subreg with adjust_address.
i386: Use various predicates instead of open coding them
No functional changes.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_move):
Use MEM_P predicate instead of open coding it.
(ix86_erase_embedded_rounding):
Use NONJUMP_INSN_P predicate instead of open coding it.
* config/i386/i386-features.cc (convertible_comparison_p):
Use REG_P predicate instead of open coding it.
* config/i386/i386.cc (ix86_rtx_costs):
Use SUBREG_P predicate instead of open coding it.
Richard Biener [Wed, 16 Jul 2025 13:07:58 +0000 (15:07 +0200)]
tree-optimization/121049 - avoid loop masking with even/odd reduction
The following disables loop masking when we are using an even/odd
widening operation in a reduction because the loop mask then aligns
to the wrong elements.
PR tree-optimization/121049
* internal-fn.h (widening_evenodd_fn_p): Declare.
* internal-fn.cc (widening_evenodd_fn_p): New function.
* tree-vect-stmts.cc (vectorizable_conversion): When using
an even/odd widening function disable loop masking.
Andrew Pinski [Tue, 8 Jul 2025 03:01:04 +0000 (20:01 -0700)]
ifconv: simple factor out operators while doing ifcvt [PR119920]
For possible reductions, ifconv currently handles if the addition
is on one side of the if. But in the case of PR 119920, the reduction
addition is on both sides of the if.
E.g.
```
if (_27 == 0)
goto <bb 14>; [50.00%]
else
goto <bb 13>; [50.00%]
But the vectorizer does not recognize this as a reduction.
To fix this, we should factor out the addition from the `if`.
This allows us to get:
```
iftmp.0_7 = _22 ? b_13(D) : c_12(D);
a_14 = iftmp.0_7 + a_18;
```
Which then the vectorizer recognizes as a reduction.
In the case of PR 112324 and PR 110015, it is similar but with MAX_EXPR reduction
instead of an addition.
Note while this should be done in phiopt, there are regressions
due to other passes not able to handle the factored out cases
(see linked bug to PR 64700). I have not had time to fix all of the passes
that could handle the addition being in the if/then/else rather than being outside yet.
So this is I thought it would be useful just to have a localized version in ifconv which
is then only used for the vectorizer.
* tree-if-conv.cc (find_different_opnum): New function.
(factor_out_operators): New function.
(predicate_scalar_phi): Call factor_out_operators when
there is only 2 elements of a phi.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-reduc-cond-1.c: New test.
* gcc.dg/vect/vect-reduc-cond-2.c: New test.
* gcc.dg/vect/vect-reduc-cond-3.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>