]> git.ipfire.org Git - thirdparty/gcc.git/log
thirdparty/gcc.git
10 days agoivopts: Change constant_multiple_of to expand aff nodes.
Alfie Richards [Tue, 24 Jun 2025 13:49:27 +0000 (13:49 +0000)] 
ivopts: Change constant_multiple_of to expand aff nodes.

This changes the calls to tree_to_aff_combination in constant_multiple_of to
tree_to_aff_combination_expand along with associated plumbing of ivopts_data
and required cache.

This improves cases such as:

```c
void f(int *p1, int *p2, unsigned long step, unsigned long end, svbool_t pg) {
    for (unsigned long i = 0; i < end; i += step) {
        svst1(pg, p1, svld1_s32(pg, p2));
        p1 += step;
        p2 += step;
    }
}
```

Where ivopts previously didn't expand the SSA variables for the step increements
and so lacked the ability to group all the IV's and ended up with:

```
f:
cbz x3, .L1
mov x4, 0
.L3:
ld1w z31.s, p0/z, [x1]
add x4, x4, x2
st1w z31.s, p0, [x0]
add x1, x1, x2, lsl 2
add x0, x0, x2, lsl 2
cmp x3, x4
bhi .L3
.L1:
ret
```

After this change we end up with:

```
f:
cbz x3, .L1
mov x4, 0
.L3:
ld1w z31.s, p0/z, [x1, x4, lsl 2]
st1w z31.s, p0, [x0, x4, lsl 2]
add x4, x4, x2
cmp x3, x4
bhi .L3
.L1:
ret
```

gcc/ChangeLog:

* tree-ssa-loop-ivopts.cc (constant_multiple_of): Change
tree_to_aff_combination to tree_to_aff_combination_expand and add
parameter to take ivopts_data.
(get_computation_aff_1): Change parameters and calls to include
ivopts_data.
(get_computation_aff): Ditto.
(get_computation_at) Ditto.:
(get_debug_computation_at) Ditto.:
(get_computation_cost) Ditto.:
(rewrite_use_nonlinear_expr) Ditto.:
(rewrite_use_address) Ditto.:
(rewrite_use_compare) Ditto.:
(remove_unused_ivs) Ditto.:

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/adr_7.c: New test.

10 days agolibstdc++: Test for %S precision for durations with integral representation.
Tomasz Kamiński [Tue, 24 Jun 2025 11:49:26 +0000 (13:49 +0200)] 
libstdc++: Test for %S precision for durations with integral representation.

Existing test are extented to cover cases where not precision is specified,
or it is specified to zero. The precision value is ignored in all cases.

libstdc++-v3/ChangeLog:

* testsuite/std/time/format/precision.cc: New tests.

10 days agortl-ssa: Rewrite process_uses_of_deleted_def [PR120745]
Richard Sandiford [Wed, 25 Jun 2025 09:44:34 +0000 (10:44 +0100)] 
rtl-ssa: Rewrite process_uses_of_deleted_def [PR120745]

process_uses_of_deleted_def seems to have been written on the assumption
that non-degenerate phis would be explicitly deleted by an insn_change,
and that the function therefore only needed to delete degenerate phis.
But that was inconsistent with the rest of the code, and wouldn't be
very convenient in any case.

This patch therefore rewrites process_uses_of_deleted_def to handle
general phis.

I'm not aware that this fixes any issues in current code, but it is
needed to enable the rtl-ssa dce work that Ondřej and Honza are
working on.

gcc/
PR rtl-optimization/120745
* rtl-ssa/changes.cc (process_uses_of_deleted_def): Rewrite to
handle deletions of non-degenerate phis.

10 days agolibstdc++: Report compilation error on formatting "%d" from month_last [PR120650]
Tomasz Kamiński [Tue, 24 Jun 2025 07:17:12 +0000 (09:17 +0200)] 
libstdc++: Report compilation error on formatting "%d" from month_last [PR120650]

For month_day we incorrectly reported day information to be available, which lead
to format_error being thrown from the call to formatter::format at runtime, instead
of making call to format ill-formed.

The included test cover most of the combinations of _ChronoParts and format
specifiers.

PR libstdc++/120650

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h
(formatter<chrono::month_day_last,_CharT>::parse): Call _M_parse with
only Month being available.
* testsuite/std/time/format/data_not_present_neg.cc: New test.

10 days agox86: Update -mtune=intel for Diamond Rapids/Clearwater Forest
H.J. Lu [Tue, 24 Jun 2025 23:40:31 +0000 (07:40 +0800)] 
x86: Update -mtune=intel for Diamond Rapids/Clearwater Forest

-mtune=intel is used to generate a single binary to run well on both big
core and small core, similar to hybrid CPUs.  Update -mtune=intel to tune
for Diamond Rapids and Clearwater Forest, instead of Silvermont.

PR target/120815
* common/config/i386/i386-common.cc (processor_alias_table):
Replace CPU_SLM/PTA_NEHALEM with CPU_HASWELL/PTA_HASWELL for
PROCESSOR_INTEL.
* config/i386/i386-options.cc (processor_cost_table): Replace
intel_cost with alderlake_cost.
* config/i386/x86-tune-costs.h (intel_cost): Removed.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Treat
PROCESSOR_INTEL like PROCESSOR_ALDERLAKE.
(ix86_adjust_cost): Likewise.
* doc/invoke.texi: Update -mtune=intel for Diamond Rapids and
Clearwater Forest.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
10 days agoi386: Remove CLDEMOTE for clients
Haochen Jiang [Wed, 25 Jun 2025 02:34:37 +0000 (10:34 +0800)] 
i386: Remove CLDEMOTE for clients

CLDEMOTE is not enabled on clients according to SDM. SDM only mentioned
it will be enabled on Xeon and Atom servers, not clients. Remove them
since Alder Lake (where it is introduced).

gcc/ChangeLog:

* config/i386/i386.h (PTA_ALDERLAKE): Use PTA_GOLDMONT_PLUS
as base to remove PTA_CLDEMOTE.
(PTA_SIERRAFOREST): Add PTA_CLDEMOTE since PTA_ALDERLAKE
does not include that anymore.
* doc/invoke.texi: Update texi file.

10 days agoRISC-V: Add Profiles RVA/B23S64 support.
Jiawei [Tue, 24 Jun 2025 09:34:05 +0000 (17:34 +0800)] 
RISC-V: Add Profiles RVA/B23S64 support.

This patch adds support for the RISC-V Profiles RVA23S64 and RVB23S64.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New Profiles.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-rva23s.c: New test.
* gcc.target/riscv/arch-rvb23s.c: New test.

10 days agoAdd -fauto-profile-inlining
Jan Hubicka [Wed, 25 Jun 2025 01:01:29 +0000 (03:01 +0200)] 
Add -fauto-profile-inlining

this patch adds -fauto-profile-inlining which can be used to control
the auto-profile directed inlning.

gcc/ChangeLog:

* common.opt: (fauto-profile-inlining): New
* doc/invoke.texi (-fauto-profile-inlining): Document.
* ipa-inline.cc (inline_functions_by_afdo): Check
flag_auto_profile.
(early_inliner): Also do inline_functions_by_afdo with
!flag_early_inlining.

10 days agoRemove early inlining from afdo pass
Jan Hubicka [Wed, 25 Jun 2025 00:59:54 +0000 (02:59 +0200)] 
Remove early inlining from afdo pass

This pass removes early-inlining from afdo pass since all inlining should now
happen from early inliner.  I tedted this on spec and there are 3 inlines
happening here which are blocked at early-inline time by hitting large function
growth limit.  We probably want to bypass that limit, I will look into that
incrementaly.

This should make the non-inlined function profile merging hopefully easier.

It may still make sense to separate afdo inliner from early inliner to solve
the non-transitivity issues which is not that hard to do with current code
orgnaization. However this should be separate IPA pass rather then another
part of afdo pass, since it can be coneptually separate.

gcc/ChangeLog:

* auto-profile.cc: Update toplevel comment.
(early_inline): Remove.
(auto_profile): Don't do early inlining.

10 days agoDaily bump.
GCC Administrator [Wed, 25 Jun 2025 00:19:34 +0000 (00:19 +0000)] 
Daily bump.

10 days agogcn: Fix glc vs. sc0 handling for scalar memory access
Tobias Burnus [Tue, 24 Jun 2025 21:55:27 +0000 (23:55 +0200)] 
gcn: Fix glc vs. sc0 handling for scalar memory access

gfx942 still uses glc for scalar access ('s_...') and only uses
sc0/nt/sc1 for vector access.

gcc/ChangeLog:

* config/gcn/gcn-opts.h (TARGET_GLC_NAME): Fix and extend the
description in the comment.
* config/gcn/gcn.cc (print_operand): Extend the comment about
'G' and 'g'.
* config/gcn/gcn.md: Use 'glc' instead of %G where appropriate.

10 days agoFortran/OpenACC: Add Fortran support for acc_attach/acc_detach
Tobias Burnus [Tue, 24 Jun 2025 21:28:57 +0000 (23:28 +0200)] 
Fortran/OpenACC: Add Fortran support for acc_attach/acc_detach

While C/++ support the routines acc_attach{,_async} and
acc_detach{,_finalize}{,_async} routines since a long time, the Fortran
API routines where only added in OpenACC 3.3.

Unfortunately, they cannot directly be implemented in the library as
GCC will introduce a temporary array descriptor in some cases, which
causes the attempted attachment to the this temporary variable instead
of to the original one.

Therefore, those API routines are handled in a special way in the compiler.

gcc/fortran/ChangeLog:

* trans-stmt.cc (gfc_trans_call_acc_attach_detach): New.
(gfc_trans_call): Call it.

libgomp/ChangeLog:

* libgomp.texi (acc_attach, acc_detach): Update for Fortran
version.
* openacc.f90 (acc_attach{,_async}, acc_detach{,_finalize}{,_async}):
Add.
* openacc_lib.h: Likewise.
* testsuite/libgomp.oacc-fortran/acc-attach-detach-1.f90: New test.
* testsuite/libgomp.oacc-fortran/acc-attach-detach-2.f90: New test.

10 days agoRISC-V: Add patterns for vector-scalar multiply-(subtract-)accumulate [PR119100]
Paul-Antoine Arras [Tue, 24 Jun 2025 21:42:50 +0000 (15:42 -0600)] 
RISC-V: Add patterns for vector-scalar multiply-(subtract-)accumulate [PR119100]

This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into a plus-mult or minus-mult RTL instruction.

Before this patch, we have two instructions, e.g.:
  vfmv.v.f       v6,fa0
  vfmacc.vv      v2,v6,v4

After, we get only one:
  vfmacc.vf      v2,fa0,v4

PR target/119100

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*<optab>_vf_<mode>): Handle both add and
acc FMA variants.
* config/riscv/vector.md (*pred_mul_<optab><mode>_scalar_undef): New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfmacc and vfmsac.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop.h: Add support for acc
variants.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_run.h: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f16.c: Define
TEST_OUT.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmadd-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmadd-run-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmadd-run-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsub-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsub-run-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsub-run-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmacc-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmacc-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmacc-run-1-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsac-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsac-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsac-run-1-f64.c: New test.

10 days agoFortran: fix ICE in verify_gimple_in_seq with substrings [PR120743]
Harald Anlauf [Tue, 24 Jun 2025 18:46:38 +0000 (20:46 +0200)] 
Fortran: fix ICE in verify_gimple_in_seq with substrings [PR120743]

PR fortran/120743

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_substring): Substring indices are of
type gfc_charlen_type_node.  Convert to size_type_node for
pointer arithmetic only after offset adjustments have been made.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr120743.f90: New test.

Co-authored-by: Jerry DeLisle <jvdelisle@gcc.gnu.org>
Co-authored-by: Mikael Morin <mikael@gcc.gnu.org>
10 days agoc++: Implement C++26 P3618R0 - Allow attaching main to the global module [PR120773]
Jakub Jelinek [Tue, 24 Jun 2025 17:00:11 +0000 (19:00 +0200)] 
c++: Implement C++26 P3618R0 - Allow attaching main to the global module [PR120773]

The following patch implements the P3618R0 paper by tweaking pedwarn
condition, adjusting pedwarn wording, adjusting one testcase and adding 4
new ones.  The paper was voted in as DR, so it isn't guarded on C++ version.

2025-06-24  Jakub Jelinek  <jakub@redhat.com>

PR c++/120773
* decl.cc (grokfndecl): Implement C++26 P3618R0 - Allow attaching
main to the global module.  Only pedwarn for current_lang_name
other than lang_name_cplusplus and adjust pedwarn wording.

* g++.dg/parse/linkage5.C: Don't expect error on
extern "C++" int main ();.
* g++.dg/parse/linkage7.C: New test.
* g++.dg/parse/linkage8.C: New test.
* g++.dg/modules/main-2.C: New test.
* g++.dg/modules/main-3.C: New test.

10 days agoi386: Convert LEA stack adjust insn to SUB when FLAGS_REG is dead
Uros Bizjak [Tue, 24 Jun 2025 09:02:02 +0000 (11:02 +0200)] 
i386: Convert LEA stack adjust insn to SUB when FLAGS_REG is dead

ADD/SUB is faster than LEA for most processors. Also, there are
several peephole2 patterns available that convert prologue esp
subtractions to pushes (at the end of i386.md). These process only
patterns with flags reg clobber, so they are ineffective
with clobber-less stack ptr adjustments, introduced by r16-1551
("x86: Enable separate shrink wrapping").

Introduce a peephole2 pattern that adds a clobber to a clobber-less
stack ptr adjustments when FLAGS_REG is dead.

gcc/ChangeLog:

* config/i386/i386.md
(@pro_epilogue_adjust_stack_add_nocc<mode>): Add type attribute.
(pro_epilogue_adjust_stack_add_nocc peephole2 pattern):
Convert pro_epilogue_adjust_stack_add_nocc variant to
pro_epilogue_adjust_stack_add when FLAGS_REG is dead.

10 days agoRemove non-SLP path from vectorizable_load
Richard Biener [Tue, 24 Jun 2025 12:38:19 +0000 (14:38 +0200)] 
Remove non-SLP path from vectorizable_load

This cleans the rest of vectorizable_load from non-SLP, propagates
out ncopies == 1, and elides loops from 0 to ncopies.

* tree-vect-stmts.cc (vectorizable_load): Remove non-SLP
paths and propagate out ncopies == 1.

10 days agodiagnostic: fix for older version of GCC
Marc Poulhiès [Tue, 24 Jun 2025 13:12:30 +0000 (15:12 +0200)] 
diagnostic: fix for older version of GCC

Having both an enum and a variable with the same name triggers an error with
gcc 5.

gcc/ChangeLog:
* diagnostic-state-to-dot.cc (get_color_for_dynalloc_state):
Rename argument dynalloc_state to dynalloc_st.
(add_title_tr): Rename argument style to styl.
(on_xml_node): Rename local variable dynalloc_state to dynalloc_st.

10 days agolibstdc++: Unnecessary type completion in __is_complete_or_unbounded [PR120717]
Patrick Palka [Tue, 24 Jun 2025 13:33:25 +0000 (09:33 -0400)] 
libstdc++: Unnecessary type completion in __is_complete_or_unbounded [PR120717]

When checking __is_complete_or_unbounded on a reference to incomplete
type, we overeagerly try to instantiate/complete the referenced type
which besides being unnecessary may also produce an unexpected
-Wsfinae-incomplete warning (added in r16-1527) if the referenced type
is later defined.

This patch fixes this by effectively restricting the sizeof check to
object (except unknown-bound array) types.  In passing simplify the
implementation by using is_object instead of is_function/reference/void
and introducing a __maybe_complete_object_type helper.

PR libstdc++/120717

libstdc++-v3/ChangeLog:

* include/std/type_traits (__maybe_complete_object_type): New
helper trait, factored out from ...
(__is_complete_or_unbounded): ... here.  Only check sizeof on a
__maybe_complete_object_type type.  Fix formatting.
* testsuite/20_util/is_complete_or_unbounded/120717.cc: New test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
10 days agogcc: remove atan from edom_only_function
Yuao Ma [Mon, 23 Jun 2025 16:06:16 +0000 (00:06 +0800)] 
gcc: remove atan from edom_only_function

According to the man page, atan does not produce an error. According to the C23
standard draft (N3088), a range error occurs for atan if a nonzero x is too
close to zero. Neither of them mentions that atan will result in a domain error.

gcc/ChangeLog:

* tree-call-cdce.cc (edom_only_function): Remove atan.

Signed-off-by: Yuao Ma <c8ef@outlook.com>
11 days agos390: Fix float vector extract for pre-z13
Juergen Christ [Wed, 18 Jun 2025 13:16:28 +0000 (15:16 +0200)] 
s390: Fix float vector extract for pre-z13

Also provide the vec_extract patterns for floats on pre-z13 machines
to prevent ICEing in those cases.

gcc/ChangeLog:

* config/s390/vector.md (VF): Don't restrict modes.
(VEC_SET_SINGLEFLOAT): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-extract-1.c: Fix test on arch11.
* gcc.target/s390/vector/vec-set-1.c: Run test on arch11.
* gcc.target/s390/vector/vec-extract-2.c: New test.

Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
11 days agoAArch64: promote aarch64-autovec-peference to mautovec-preference
Tamar Christina [Tue, 24 Jun 2025 10:11:36 +0000 (11:11 +0100)] 
AArch64: promote aarch64-autovec-peference to mautovec-preference

As requested in my patch for -mmax-vectorization this promotes the parameter
--param aarch64-autovec-preference to a first class top target flag.

If both the parameter and the flag is specified the parameter takes precedence
with the reasoning that it may already be embedded in build systems.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_override_options_internal): Set
value of parameter based on option.
* config/aarch64/aarch64.opt (autovec-preference): New.
* doc/invoke.texi (autovec-preference): Document it.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/autovec_param_asimd-only_2.c: New test.
* gcc.target/aarch64/autovec_param_default_2.c: New test.
* gcc.target/aarch64/autovec_param_prefer-asimd_2.c: New test.
* gcc.target/aarch64/autovec_param_prefer-sve_2.c: New test.
* gcc.target/aarch64/autovec_param_sve-only_2.c: New test.

11 days agoAArch64: propose -mmax-vectorization as an option to override vector costing
Tamar Christina [Tue, 24 Jun 2025 10:10:11 +0000 (11:10 +0100)] 
AArch64: propose -mmax-vectorization as an option to override vector costing

With the middle-end providing a way to make vectorization more profitable by
scaling vect-scalar-cost-multiplier this makes a more user friendly option
to make it easier to use.

I propose making it an actual -m option that we document and retain vs using
the parameter name.  In the future I would like to extend this option to modify
additional costing in the AArch64 backend itself.

This can be used together with --param aarch64-autovec-preference to get the
vectorizer to say, always vectorize with SVE.  I did consider making this an
additional enum to --param aarch64-autovec-preference but I also think this is
a useful thing to be able to set with pragmas and attributes, but am open to
suggestions.

Note that as a follow up I plan on extending -fdump-tree-vect to support -stats
which is then intended to be usable with this flag.

gcc/ChangeLog:

* config/aarch64/aarch64.opt (max-vectorization): New.
* config/aarch64/aarch64.cc (aarch64_override_options_internal): Save
and restore option.
Implement it through vect-scalar-cost-multiplier.
(aarch64_attributes): Default to off.
* common/config/aarch64/aarch64-common.cc (aarch64_handle_option):
Initialize option.
* doc/extend.texi (max-vectorization): Document attribute.
* doc/invoke.texi (max-vectorization): Document flag.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/cost_model_17.c: New test.
* gcc.target/aarch64/sve/cost_model_18.c: New test.

11 days agofortran: Mention user variable in SELECT TYPE temporary variable names
Mikael Morin [Fri, 20 Jun 2025 10:08:02 +0000 (12:08 +0200)] 
fortran: Mention user variable in SELECT TYPE temporary variable names

The temporary variables that are generated to implement SELECT TYPE
and TYPE IS statements have (before this change) a name depending only
on the type.  This can produce confusing dumps with code having multiple
SELECT TYPE statements, as it isn't obvious which SELECT TYPE construct
the variable relates to.  This is especially the case with nested SELECT
TYPE statements and with SELECT TYPE variables having identical types
(and thus identical names).

This change adds one additional user-provided discriminating string in
the variable names, using the value from the SELECT TYPE variable name
or last component reference name.  The additional string may be
truncated to fit in the temporary buffer.  This requires all buffers to
have matching sizes to get the same resulting name everywhere.

gcc/fortran/ChangeLog:

* misc.cc (gfc_var_name_for_select_type_temp): New function.
* gfortran.h (gfc_var_name_for_select_type_temp): Declare it.
* resolve.cc (resolve_select_type): Pick a discriminating name
from the SELECT TYPE variable reference and use it in the name
of the temporary variable that is generated.  Truncate name to
the buffer size.
* match.cc (select_type_set_tmp): Likewise.  Pass the
discriminating name...
(select_intrinsic_set_tmp): ... to this function.  Use the
discriminating name likewise.  Augment the buffer size to match
that of select_type_set_tmp and resolve_select_type.

gcc/testsuite/ChangeLog:

* gfortran.dg/select_type_51.f90: New test.

11 days agoDon't duplicate setup code cost when do group-candidate cost calucalution.
hongtao.liu [Wed, 5 Mar 2025 11:25:32 +0000 (12:25 +0100)] 
Don't duplicate setup code cost when do group-candidate cost calucalution.

-  /* Uses in a group can share setup code, so only add setup cost once.  */
-  cost -= cost.scratch;

It looks like the original code took into account avoiding double
counting, but unfortunately cost is reset inside the follow loop which
invalidates the upper code, and makes same setup code cost duplicated in
each use of the group.

The patch fix the issue. It can also improve 548.exchange_r by 6% with
-march=x86-64-v3 -O2 due to better ivopt on EMR.

No big performance impact for SPEC2017 on graviton4/SPR with -mcpu=native
-Ofast -fomit-framepointer -flto=auto.

gcc/ChangeLog:

PR target/115842
* tree-ssa-loop-ivopts.cc (determine_group_iv_cost_address):
Don't recalculate inv_expr when group-candidate cost
calucalution.

11 days agomiddle-end: Apply loop->unroll directly in vectorizer
Tamar Christina [Tue, 24 Jun 2025 06:14:27 +0000 (07:14 +0100)] 
middle-end: Apply loop->unroll directly in vectorizer

Consider the loop

void f1 (int *restrict a, int n)
{
#pragma GCC unroll 4 requested
  for (int i = 0; i < n; i++)
    a[i] *= 2;
}

Which today is vectorized and then unrolled 3x by the RTL unroller due to the
use of the pragma.  This is unfortunate because the pragma was intended for the
scalar loop but we end up with an unrolled vector loop and a longer path to the
entry which has a low enough VF requirement to enter.

This patch instead seeds the suggested_unroll_factor with the value the user
requested and instead uses it to maintain the total VF that the user wanted the
scalar loop to maintain.

In effect it applies the unrolling inside the vector loop itself.  This has the
benefits for things like reductions, as it allows us to split the accumulator
and so the unrolled loop is more efficient.  For early-break it allows the
cbranch call to be shared between the unrolled elements, giving you more
effective unrolling because it doesn't need the repeated cbranch which can be
expensive.

The target can then choose to create multiple epilogues to deal with the "rest".

The example above now generates:

.L4:
        ldr     q31, [x2]
        add     v31.4s, v31.4s, v31.4s
        str     q31, [x2], 16
        cmp     x2, x3
        bne     .L4

as V4SI maintains the requested VF, but e.g. pragma unroll 8 generates:

.L4:
        ldp     q30, q31, [x2]
        add     v30.4s, v30.4s, v30.4s
        add     v31.4s, v31.4s, v31.4s
        stp     q30, q31, [x2], 32
        cmp     x3, x2
        bne     .L4

gcc/ChangeLog:

* doc/extend.texi: Document pragma unroll interaction with vectorizer.
* tree-vectorizer.h (LOOP_VINFO_USER_UNROLL): New.
(class _loop_vec_info): Add user_unroll.
* tree-vect-loop.cc (vect_analyze_loop_1): Set
suggested_unroll_factor and retry.
(_loop_vec_info::_loop_vec_info): Initialize user_unroll.
(vect_transform_loop): Clear the loop->unroll value if the pragma was
used.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/unroll-vect.c: New test.

11 days agomiddle-end: replace log_vf usages with vf to allow support for non-power of two vf
Tamar Christina [Tue, 24 Jun 2025 06:13:22 +0000 (07:13 +0100)] 
middle-end: replace log_vf usages with vf to allow support for non-power of two vf

This patch fixes a bug where the current code assumed that exact_log2 returns
NULL on failure, but it instead returns -1.  So there are some cases where the
right shift could shift out the entire value.

Secondly it also removes the requirement that VF be a power of two.  With an
uneven unroll factor we can easily end up with a non-power of two VF which SLP
can handle. This replaces shifts with multiplication and division.

The 32-bit x86 testcase from PR64110 was always wrong, it used to match by pure
coincidence a vmovd inside the vector loop.  What it intended to match was that
the argument to the function isn't spilled and then reloaded from the stack for
no reason.

But on 32-bit x86 all arguments are passed on the stack anyway and so the match
would have never worked.  The patch seems to simplify the loop preheader which
gets it to remove an intermediate zero extend which causes the match to now
properly fail.

As such I'm skipping the test on 32-bit x86.

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_gen_vector_loop_niters,
vect_gen_vector_loop_niters_mult_vf): Remove uses of log_vf.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr64110.c: Update testcase.

11 days agox86: Extend the remove_redundant_vector pass
H.J. Lu [Thu, 8 May 2025 23:17:07 +0000 (07:17 +0800)] 
x86: Extend the remove_redundant_vector pass

Extend the remove_redundant_vector pass to handle vector broadcasts from
constant and variable scalars.  When broadcasting from constants and
function arguments, we can place a single widest vector broadcast at
entry of the nearest common dominator for basic blocks with all uses
since constants and function arguments aren't changed.  For broadcast
from variables with a single definition, the single definition is
replaced with the widest broadcast.

gcc/

PR target/92080
* config/i386/i386-expand.cc (ix86_expand_call): Set
recursive_function to true for recursive call.
* config/i386/i386-features.cc (ix86_place_single_vector_set):
Add an argument for inner scalar, default to nullptr.  Set the
source from inner scalar if not nullptr.
(ix86_get_vector_load_mode): Renamed to ...
(ix86_get_vector_cse_mode): This.  Add an argument for scalar mode
and handle integer and float scalar modes.
(replace_vector_const): Add an argument for scalar mode and pass
it to ix86_get_vector_load_mode.
(x86_cse_kind): New.
(redundant_load): Likewise.
(ix86_broadcast_inner): Likewise.
(remove_redundant_vector_load): Also support const0_rtx and
constm1_rtx broadcasts.  Handle vector broadcasts from constant
and variable scalars.
* config/i386/i386.h (machine_function): Add recursive_function.

gcc/testsuite/

* gcc.target/i386/keylocker-aesdecwide128kl.c: Updated to expect
movdqa instead pxor.
* gcc.target/i386/keylocker-aesdecwide256kl.c: Likewise.
* gcc.target/i386/keylocker-aesencwide128kl.c: Likewise.
* gcc.target/i386/keylocker-aesencwide256kl.c: Likewise.
* gcc.target/i386/pr92080-4.c: New test.
* gcc.target/i386/pr92080-5.c: Likewise.
* gcc.target/i386/pr92080-6.c: Likewise.
* gcc.target/i386/pr92080-7.c: Likewise.
* gcc.target/i386/pr92080-8.c: Likewise.
* gcc.target/i386/pr92080-9.c: Likewise.
* gcc.target/i386/pr92080-10.c: Likewise.
* gcc.target/i386/pr92080-11.c: Likewise.
* gcc.target/i386/pr92080-12.c: Likewise.
* gcc.target/i386/pr92080-13.c: Likewise.
* gcc.target/i386/pr92080-14.c: Likewise.
* gcc.target/i386/pr92080-15.c: Likewise.
* gcc.target/i386/pr92080-16.c: Likewise.
* gcc.target/i386/pr92080-17.c: Likewise.
* gcc.target/i386/pr92080-18.c: Likewise.
* gcc.target/i386/pr92080-19.c: Likewise.
* gcc.target/i386/pr92080-20.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
11 days agox86: Update memcpy/memset inline strategies for -mtune=generic
H.J. Lu [Fri, 19 Mar 2021 01:43:10 +0000 (18:43 -0700)] 
x86: Update memcpy/memset inline strategies for -mtune=generic

Update memcpy and memset inline strategies for -mtune=generic:

1. Don't align memory.
2. For known sizes, prefer vector loop, unroll loop with 4 moves or
   stores per iteration without aligning the loop, up to 256 bytes.
3. For unknown sizes, use memcpy/memset.
4. Since each loop iteration has 4 stores and 8 stores for zeroing with
   unroll loop may be needed, change CLEAR_RATIO to 10 so that zeroing
   up to 72 bytes are fully unrolled with 9 stores without SSE.

gcc/

PR target/70308
PR target/101366
PR target/102294
PR target/108585
PR target/118276
PR target/119596
PR target/119703
PR target/119704
* config/i386/x86-tune-costs.h (generic_memcpy): Updated.
(generic_memset): Likewise.
(generic_cost): Change CLEAR_RATIO to 10.

gcc/testsuite/

PR target/70308
PR target/101366
PR target/102294
PR target/108585
PR target/118276
PR target/119596
PR target/119703
PR target/119704
* g++.target/i386/memset-pr101366-1.C: New test.
* g++.target/i386/memset-pr101366-2.C: Likewise.
* g++.target/i386/memset-pr108585-1a.C: Likewise.
* g++.target/i386/memset-pr108585-1b.C: Likewise.
* g++.target/i386/memset-pr118276-1a.C: Likewise.
* g++.target/i386/memset-pr118276-1b.C: Likewise.
* g++.target/i386/memset-pr118276-1c.C: Likewise.
* gcc.target/i386/memcpy-strategy-12.c: Likewise.
* gcc.target/i386/memcpy-strategy-13.c: Likewise.
* gcc.target/i386/memset-pr70308-1a.c: Likewise.
* gcc.target/i386/memset-pr70308-1b.c: Likewise.
* gcc.target/i386/memset-strategy-25.c: Likewise.
* gcc.target/i386/memset-strategy-26.c: Likewise.
* gcc.target/i386/memset-strategy-27.c: Likewise.
* gcc.target/i386/memset-strategy-28.c: Likewise.
* gcc.target/i386/memset-strategy-29.c: Likewise.
* gcc.target/i386/memset-strategy-30.c: Likewise.
* gcc.target/i386/memset-strategy-31.c: Likewise.
* gcc.target/i386/auto-init-padding-3.c: Expect XMM stores.
* gcc.target/i386/auto-init-padding-9.c: Likewise.
* gcc.target/i386/mvc17.c: Fail with "rep mov"
* gcc.target/i386/pr111657-1.c: Scan for unrolled loop.  Fail
with "rep mov".
* gcc.target/i386/shrink_wrap_1.c: Also pass
-mmemset-strategy=rep_8byte:-1:align.
* gcc.target/i386/sw-1.c: Also pass -mstringop-strategy=rep_byte.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
11 days agoCopy discriminators when inlining
Jan Hubicka [Tue, 24 Jun 2025 03:07:42 +0000 (05:07 +0200)] 
Copy discriminators when inlining

When inlining disciriminator info about the call statement is lost which
is not good for auto-profile and debug info quality.  This patch fixes
it.

gcc/ChangeLog:

* tree-inline.cc (expand_call_inline): Preserve discriminator.

11 days agoFix AFDO zero profile handling
Jan Hubicka [Tue, 24 Jun 2025 03:00:01 +0000 (05:00 +0200)] 
Fix AFDO zero profile handling

This patch fixes roms autofdo regression I introduced yesterday.  What happens
is that loop vectorization is disabled, because we get loop header count 0.
I.e.

loop_header:  <count 0>
  if (i < n)
    goto exit;
loop_body:    <count large>
  ... vectorizable computation ...

The reason is that "if (i < 0)" statement actually has 0 profile in AFDO
feedback.  This seems common and I believe it is an issue with debug info in
loop vecotrizer.  Because loop is vectorized during train run, the conditoinal
is replaced by vectorized loop conditional but the statement remains in the
loop epilogue which is not executed at runtime.

This is something we can fix and introduce debug statement in the vectorized loop
body so user can breakpoint on it. I will try to produce testcase for that.

However this patch fixes bug where I intended to only trust 0 counts from AFDO if they
are also 0 in static profile and reversed the conditinal.

autoprofile-bootstrapped/regtested x86_64-linux, comitted.

* auto-profile.cc (afdo_set_bb_count): Dump also 0 count stmts.
(afdo_annotate_cfg): Fix conditional for block having non-zero static
profile.

11 days agoFix shrink wrap separate ICE for mingw [PR120741]
Lili Cui [Tue, 24 Jun 2025 02:49:43 +0000 (10:49 +0800)] 
Fix shrink wrap separate ICE for mingw [PR120741]

gcc/ChangeLog:

PR target/120741
* config/i386/i386.cc (ix86_expand_prologue):
Remove 1 assertion.

gcc/testsuite/ChangeLog:

PR target/120741
* gcc.target/i386/pr120741.c: New test.
* gcc.target/i386/shrink-wrap-separate-mingw.c: Likewise.

11 days ago[RISC-V][PR target/118241] Fix data prefetch predicate/constraint for RISC-V
Jeff Law [Tue, 24 Jun 2025 00:27:49 +0000 (18:27 -0600)] 
[RISC-V][PR target/118241] Fix data prefetch predicate/constraint for RISC-V

Fix typo in comment spotted by Peter B.

PR target/118241
gcc/
* config/riscv/predicates.md: Fix comment typo in recent change.

11 days agoDaily bump.
GCC Administrator [Tue, 24 Jun 2025 00:18:59 +0000 (00:18 +0000)] 
Daily bump.

11 days agoFixup dropping REG_EQUAL note in ext-dce
Sam James [Mon, 23 Jun 2025 22:28:01 +0000 (23:28 +0100)] 
Fixup dropping REG_EQUAL note in ext-dce

Followup to r16-1613-g34e1e5e33ec3eb. remove_reg_equal_equiv_notes's
2nd argument is 'no_rescan' which we accidentally had on, tripping
an assert in combine or ira because we hadn't left things in a consistent
state.

Fix the thinko by enabling rescanning.

gcc/ChangeLog:
PR rtl-optimization/120795

* ext-dce.cc (ext_dce_try_optimize_insn): Enable rescan in
remove_reg_equal_equiv_notes call.

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
11 days agolibgdiagnostics: sarif-replay: add extra sinks via -fdiagnostics-add-output= [PR11679...
David Malcolm [Mon, 23 Jun 2025 22:46:51 +0000 (18:46 -0400)] 
libgdiagnostics: sarif-replay: add extra sinks via -fdiagnostics-add-output= [PR116792,PR116163]

This patch refactors the support for -fdiagnostics-add-output=SCHEME
from GCC's options parsing so that it is also available to
sarif-replay and to other clients of libgdiagnostics.

With this users of sarif-replay and other such tools can generate HTML
or SARIF as well as text output, using the same
  -fdiagnostics-add-output=SCHEME
as GCC.

As a test, the patch adds support for this option to the dg-lint
script below "contrib".  For example dg-lint can now generate text,
html, and sarif output via:

  LD_LIBRARY_PATH=../build/gcc/ \
    ./contrib/dg-lint/dg-lint \
contrib/dg-lint/test-*.c \
-fdiagnostics-add-output=experimental-html:file=dg-lint-tests.html \
        -fdiagnostics-add-output=sarif:file=dg-lint-tests.sarif

where the HTML output from dg-lint can be seen here:
  https://dmalcolm.fedorapeople.org/gcc/2025-06-20/dg-lint-tests.html
the sarif output here:
  https://dmalcolm.fedorapeople.org/gcc/2025-06-23/dg-lint-tests.sarif
and a screenshot of VS Code viewing the sarif output is here:
  https://dmalcolm.fedorapeople.org/gcc/2025-06-23/vscode-viewing-dg-lint-sarif-output.png

As well as allowing sarif-replay to generate HTML, this patch allows
sarif-replay to also generate SARIF.  Ideally this would faithfully
round-trip all the data, but it's not perfect (which I'm tracking as
PR sarif-replay/120792).

contrib/ChangeLog:
PR other/116792
PR testsuite/116163
PR sarif-replay/120792
* dg-lint/dg-lint: Add -fdiagnostics-add-output.
* dg-lint/libgdiagnostics.py: Add
diagnostic_manager_add_sink_from_spec.
(Manager.add_sink_from_spec): New.

gcc/ChangeLog:
PR other/116792
PR testsuite/116163
PR sarif-replay/120792
* Makefile.in (OBJS-libcommon): Add diagnostic-output-spec.o.
* diagnostic-format-html.cc (html_builder::html_builder): Ensure
title is non-empty.
* diagnostic-output-spec.cc: New file, taken from material in
opts-diagnostic.cc.
* diagnostic-output-spec.h: New file.
* diagnostic.cc (diagnostic_context::set_main_input_filename):
New.
* diagnostic.h (diagnostic_context::set_main_input_filename): New
decl.
* doc/libgdiagnostics/topics/compatibility.rst
(LIBGDIAGNOSTICS_ABI_2): New.
* doc/libgdiagnostics/topics/diagnostic-manager.rst
(diagnostic_manager_add_sink_from_spec): New.
(diagnostic_manager_set_analysis_target): New.
* libgdiagnostics++.h (manager::add_sink_from_spec): New.
(manager::set_analysis_target): New.
* libgdiagnostics.cc: Include "diagnostic-output-spec.h".
(struct spec_context): New.
(diagnostic_manager_add_sink_from_spec): New.
(diagnostic_manager_set_analysis_target): New.
* libgdiagnostics.h
(LIBDIAGNOSTICS_HAVE_diagnostic_manager_add_sink_from_spec): New
define.
(diagnostic_manager_add_sink_from_spec): New decl.
(LIBDIAGNOSTICS_HAVE_diagnostic_manager_set_analysis_target): New
define.
(diagnostic_manager_set_analysis_target): New decl.
* libgdiagnostics.map (LIBGDIAGNOSTICS_ABI_2): New.
* libsarifreplay.cc (sarif_replayer::handle_artifact_obj): Looks
for "analysisTarget" in roles and call set_analysis_target using
the artifact if found.
* opts-diagnostic.cc: Refactor, moving material to
diagnostic-output-spec.cc.
(struct opt_spec_context): New.
(handle_OPT_fdiagnostics_add_output_): Use opt_spec_context.
(handle_OPT_fdiagnostics_set_output_): Likewise.
* sarif-replay.cc: Define INCLUDE_STRING.
(struct options): Add m_extra_output_specs.
(usage_msg): Add -fdiagnostics-add-output=SCHEME.
(str_starts_with): New.
(parse_options): Add -fdiagnostics-add-output=SCHEME.
(main): Likewise.
* selftest-run-tests.cc (selftest::run_tests): Call
diagnostic_output_spec_cc_tests rather than
opts_diagnostic_cc_tests.
* selftest.h (selftest::diagnostic_output_spec_cc_tests):
Replace...
(selftest::opts_diagnostic_cc_tests): ...this.

gcc/testsuite/ChangeLog:
PR other/116792
PR testsuite/116163
PR sarif-replay/120792
* sarif-replay.dg/2.1.0-valid/signal-1-check-html.py: New test
script.
* sarif-replay.dg/2.1.0-valid/signal-1.c.sarif: Add html and sarif
generation to options.  Invoke the new script to verify that HTML
and SARIF is generated.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
11 days agoanalyzer: fix missing "final override"
David Malcolm [Mon, 23 Jun 2025 22:46:44 +0000 (18:46 -0400)] 
analyzer: fix missing "final override"

No functional change intended.

gcc/analyzer/ChangeLog:
* region-model.cc
(exception_thrown_from_unrecognized_call::print): Add
"final override" to vfunc.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
11 days agoOpenACC: Add 'if' clause to 'acc wait' directive
Tobias Burnus [Mon, 23 Jun 2025 21:24:56 +0000 (23:24 +0200)] 
OpenACC: Add 'if' clause to 'acc wait' directive

OpenACC 3.0 added the 'if' clause to four directives; this patch only adds
it to 'acc wait'.

gcc/c-family/ChangeLog:

* c-omp.cc (c_finish_oacc_wait): Handle if clause.

gcc/c/ChangeLog:

* c-parser.cc (OACC_WAIT_CLAUSE_MASK): Add if clause.

gcc/cp/ChangeLog:

* parser.cc (OACC_WAIT_CLAUSE_MASK): Ass if clause.

gcc/fortran/ChangeLog:

* openmp.cc (OACC_WAIT_CLAUSES): Add if clause.
* trans-openmp.cc (gfc_trans_oacc_wait_directive): Handle it.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/acc-wait-1.c: New test.
* gfortran.dg/goacc/acc-wait-1.f90: New test.

11 days agoFortran: fix checking of renamed-on-use interface name [PR120784]
Harald Anlauf [Mon, 23 Jun 2025 19:33:40 +0000 (21:33 +0200)] 
Fortran: fix checking of renamed-on-use interface name [PR120784]

PR fortran/120784

gcc/fortran/ChangeLog:

* interface.cc (gfc_match_end_interface): If a use-associated
symbol is renamed, use the local_name for checking.

gcc/testsuite/ChangeLog:

* gfortran.dg/interface_63.f90: New test.

11 days agocontrib: handle GDB's 'unexpected core files' count
Andrew Burgess [Mon, 23 Jun 2025 15:17:19 +0000 (16:17 +0100)] 
contrib: handle GDB's 'unexpected core files' count

This commit is for the benefit of GDB, but as the binutils-gdb
repository shares the contrib/ directory with gcc, this commit must
first be applied to gcc then copied back to binutils-gdb.

This commit extends the two scripts contrib/dg-extract-results.{py,sh}
to handle GDB's 'unexpected core files' count.  This test result type
should never appear in GCC, or any other tool that shares the contrib/
directory, so this change should be harmless for others.

The 'unexpected core files' count was added to GDB's results by this
series:

  https://inbox.sourceware.org/gdb-patches/20220623183053.172430-1-pedro@palves.net

this count is added to the gdb.sum file after all the tests have run,
and counts up any core.* files that have appeared.

GDB also has a make-check-all.sh script which runs a test with all the
different board files that GDB supports.  After each test is run the
'unexpected core files' count will be added to that board's results.

I'm now trying to use the dg-extract-results.* scripts to merge the
results from all the different board files, and the 'unexpected core
files' count is confusing these scripts.

contrib/ChangeLog:

* dg-extract-results.py: Handle GDB's unexpected core file count.
* dg-extract-results.sh: Likewise.

11 days agodiagnostics: add state diagrams to analyzer experimental-html output [PR116792]
David Malcolm [Mon, 23 Jun 2025 15:06:43 +0000 (11:06 -0400)] 
diagnostics: add state diagrams to analyzer experimental-html output [PR116792]

This patch adds various support for debugging diagnostic paths and
events, intended initially for myself to help with debugging -fanalyzer.

It adds the optional ability for a diagnostic_event to supply a
description of the predicted state of the program at that point along
the diagnostic_path.  To isolate the diagnostic subsystem from the
analyzer, this representation is currently an xml::document with custom
elements.  The XML representation is similar to the analyzer's internal
state but can be easier to read - for example, rather than storing the
contents of memory via byte offsets, it uses fields for structs and
element indexes for arrays, recursively.

These states are handled by the HTML and SARIF diagnostic sinks.

The SARIF sink simply embeds the XML as a string in a property bag of the
threadFlowLocation object (SARIF v2.1.0 section 3.38).

For HTML output, the "experimental-html" sink gains a new
"show-state-diagrams=yes" option i.e.:
  -fdiagnostics-add-output=experimental-html:show-state-diagrams=yes
which converts the state XML into SVG diagrams visualizing the state of
memory at each event, inspired by the "ddd" debugger.  These can be seen
by pressing 'j' and 'k' to single-step forward and backward through
events, making it *much* easier to debug -fanalyzer.

An example of output can be seen here:
  https://dmalcolm.fedorapeople.org/gcc/2025-06-23/state-diagram-1.c.html
showing an issue in a singly-linked list; there are various other
examples in the parent directory.

Generating the SVG diagrams requires an invocation of "dot" per event,
so it noticeable slows down diagnostic emission, hence the opt-in
command-line flag.  However, I'm already finding bugs in -fanalyzer with
this that I hadn't seen before.

Given that the UI is rather clunky and there is lots of room for
improvement to the visualizations, for now this feature is marked
as being for GCC developers, not end-users.

The patch also adds a dot::ast_node class hierarachy to make it easy to
create GraphViz dot files with the correct escaping, and adds a C++
wrapper around pex adding some syntactic sugar for invoking
subprocesses.

gcc/ChangeLog:
PR other/116792
* Makefile.in (ANALYZER_OBJS): Add
analyzer/ana-state-to-diagnostic-state.o.
(OBJS): Move graphviz.o to...
(OBJS-libcommon): ...here.  Add diagnostic-state-to-dot.o and pex.o.
* diagnostic-format-html.cc: Include "diagnostic-state.h" and
"graphviz.h".
(html_generation_options::html_generation_options): Initialize the
new flags.
(HTML_SCRIPT): Add function "get_any_state_diagram".  Use it
when changing current focus id to update the visibility of the
pertinent diagram, if any.
(print_pre_source): New.
(html_builder::maybe_make_state_diagram): New.
(html_path_label_writer::html_path_label_writer): Add "path" param.
Initialize m_path and m_curr_event_id.
(html_path_label_writer::begin_label): Store current event id.
(html_path_label_writer::end_label): Attempt to make a state
diagram and add it if successful.
(html_path_label_writer::get_element_id): New.
(html_path_label_writer::m_path): New field.
(html_path_label_writer::m_curr_event_id): New field.
(html_builder::make_element_for_diagnostic): Pass path to label
writer.
* diagnostic-format-html.h
(html_generation_options::m_show_state_diagrams): New field.
(html_generation_options::m_show_state_diagram_xml): New field.
(html_generation_options::m_show_state_diagram_dot_src): New field.
* diagnostic-format-sarif.cc: Include "xml.h".
(populate_thread_flow_location_object): If requested, attempt to
generate xml state and add it to the proeprty bag as
"gcc/diagnostic_event/xml_state" in xml source form.
(sarif_generation_options::sarif_generation_options): Initialize
m_xml_state.
* diagnostic-format-sarif.h
(sarif_generation_options::m_xml_state): New field.
* diagnostic-path.cc: Define INCLUDE_MAP.  Include "xml.h".
(diagnostic_event::maybe_make_xml_state): New.
* diagnostic-path.h (class xml::document): New forward decl.
(diagnostic_event::maybe_make_xml_state): New vfunc decl.
* diagnostic-state-to-dot.cc: New file.
* diagnostic-state.h: New file.
* digraph.cc: Define INCLUDE_STRING and INCLUDE_VECTOR.
* doc/analyzer.texi: Document state diagrams in html output.
(__analyzer_dump_dot): New.
(__analyzer_dump_xml): New.
* doc/invoke.texi (sarif): Add "xml-state" key.
(experimental-html): Add keys "show-state-diagrams",
"show-state-diagrams-dot-src" and "show-state-diagrams-xml".
* graphviz.cc: Define INCLUDE_MAP, INCLUDE_STRING, and
INCLUDE_VECTOR.  Include "xml.h", "xml-printer.h", "pex.h" and
"selftest.h".
(graphviz_out::graphviz_out): Extract...
(dot::writer::writer): ...this.
(graphviz_out::write_indent): Convert to...
(dot::writer::write_indent): ...this.
(graphviz_out::print): Use get_pp.
(graphviz_out::println): Likewise.
(graphviz_out::begin_tr): Likewise.
(graphviz_out::end_tr): Likewise.
(graphviz_out::begin_td): Likewise.
(graphviz_out::end_td): Likewise.
(graphviz_out::begin_trtd): Likewise.
(graphviz_out::end_tdtr): Likewise.
(dot::ast_node::dump): New.
(dot::id::id): New.
(dot::id::print): New.
(dot::id::is_identifier_p): New.
(dot::kv_pair::print): New.
(dot::attr_list::print): New.
(dot::stmt_list::print): New.
(dot::stmt_list::add_edge): New.
(dot::stmt_list::add_attr): New.
(dot::graph::print): New.
(dot::stmt_with_attr_list::set_label): New.
(dot::node_stmt::print): New.
(dot::attr_stmt::print): New.
(dot::kv_stmt::print): New.
(dot::node_id::print): New.
(dot::port::print): New.
(dot::edge_stmt::print): New.
(dot::subgraph::print): New.
(dot::make_svg_document_buffer_from_graph): New.
(dot::make_svg_from_graph): New.
(selftest:test_ids): New.
(selftest:test_trivial_graph): New.
(selftest:test_layout_example): New.
(selftest:graphviz_cc_tests): New.
* graphviz.h (xml::node): New forward decl.
(class graphviz_out): Split out into...
(class dot::writer): ...this new class
(struct dot::ast_node): New.
(struct dot::id): New.
(struct dot::kv_pair): New.
(struct dot::attr_list): New.
(struct dot::stmt_list): New.
(struct dot::graph): New.
(struct dot::stmt): New.
(struct dot::stmt_with_attr_list): New.
(struct dot::node_stmt): New.
(struct dot::attr_stmt): New.
(struct dot::kv_stmt): New.
(enum class dot::compass_pt): New.
(struct dot::port): New.
(struct dot::node_id): New.
(struct dot::edge_stmt): New.
(struct dot::subgraph): New.
(dot::make_svg_from_graph): New.
* opts-diagnostic.cc (sarif_scheme_handler::make_sink): Add
"xml-state" flag.
(html_scheme_handler::make_sink): Add flags "show-state-diagrams",
"show-state-diagram-dot-src", and "show-state-diagram-xml".
* pex.cc: New file.
* pex.h: New file.
* selftest-run-tests.cc (selftest::run_tests): Call
graphviz_cc_tests.
* selftest.h (selftest::graphviz_cc_tests): New decl.
* xml.cc (xml::node_with_children::add_comment): New.
(xml::node_with_children::find_child_element): New.
(xml::element::get_attr): New.
(xml::comment::write_as_xml): New.
(selftest::test_printer): Add coverage of find_child_element and
get_attr.
(selftest::test_comment): New.
(selftest::xml_cc_tests): Call test_comment.
* xml.h: New forward decls.
(xml::node::dyn_cast_text): Use nullptr.
(xml::node::dyn_cast_element): New vfunc.
(xml::node_with_children::add_comment): New decl.
(xml::node_with_children::find_child_element): New decl.
(xml::element::dyn_cast_element): New vfunc impl.
(xml::element::get_attr): New decl.
(struct xml::comment): New xml::node subclass.

gcc/analyzer/ChangeLog:
PR other/116792
* ana-state-to-diagnostic-state.cc: New file.
* ana-state-to-diagnostic-state.h: New file.
* checker-event.cc: Include "xml.h".
(checker_event::checker_event): Initialize m_path.
(checker_event::prepare_for_emission): Store the path pointer into
m_path.
(checker_event::maybe_make_xml_state): New.
(function_entry_event::function_entry_event): Add "state" param
and use it to initialize m_state.
(superedge_event::get_program_state): New.
(call_event::get_program_state): New.
(warning_event::get_program_state): New.
* checker-event.h (checker_event::get_program_state): New vfunc.
(checker_event::maybe_make_xml_state): New decl.
(checker_event::m_path): New field.
(statement_event::get_program_state): New vfunc impl.
(function_entry_event::function_entry_event): Add "state" param.
(function_entry_event::get_program_state): New vfunc impl.
(function_entry_event::m_state): New field.
(state_change_event::get_program_state): New vfunc impl.
(superedge_event::get_program_state): New vfunc decl.
(warning_event::warning_event): Add "program_state_" param and
copy it.
(warning_event::get_program_state): New vfunc decl.
(warning_event::m_program_state): New field.
* checker-path.h (checker_path::checker_path): Add ext_state param.
(checker_path::get_ext_state): New accessor.
(checker_path::m_ext_state): New field.
* common.h: Define INCLUDE_MAP and INCLUDE_STRING.
* diagnostic-manager.cc (saved_diagnostic::operator==): Don't
deduplicate dump_path_diagnostic instances.
(diagnostic_manager::emit_saved_diagnostic): Pass ext_state to
checker_path ctor.
* engine.cc:
(impl_region_model_context::on_state_leak): Pass old and new state
to state_machine::on_leak.
(exploded_node::on_stmt_pre): Implement __analyzer_dump_xml and
__analyzer_dump_dot.
* exploded-graph.h (impl_region_model_context::get_state): New.
* infinite-recursion.cc
(recursive_function_entry_event::recursive_function_entry_event):
Add "dst_state" param and pass to function_entry_event ctor.
(infinite_recursion_diagnostic::add_function_entry_event): Pass state
to event ctor.
* kf-analyzer.cc: Include "analyzer/program-state.h"
(dump_path_diagnostic::dump_path_diagnostic): Add "state" param.
(dump_path_diagnostic::get_final_state): New.
(dump_path_diagnostic::m_state): New field.
(kf_analyzer_dump_path::impl_call_pre): Pass state to warning.
* pending-diagnostic.cc
(pending_diagnostic::add_function_entry_event): Pass state to
function_entry_event.
(pending_diagnostic::add_final_event): Likewise to warning_event.
* pending-diagnostic.h (pending_diagnostic::get_final_state): New
vfunc decl.
* program-state.cc: Include "diagnostic-state.h", "graphviz.h" and
"analyzer/ana-state-to-diagnostic-state.h".
(program_state::dump_dot): New.
* program-state.h: Include "text-art/tree-widget.h" and
"analyzer/store.h".
(class xml::document): New forward decl.
(make_xml): New.
(dump_xml_to_pp): New.
(dump_xml_to_file): New.
(dump_xml): New.
(dump_dot): New.
* record-layout.cc (record_layout::record_layout): Make param
const_tree.
* record-layout.h (item::item): Likewise.
(item::m_field): Likewise.
(record_layout::record_layout): Likewise.
(record_layout::begin): New.
(record_layout::end): New.
* region-model.cc
(exposure_through_uninit_copy::complain_about_fully_uninit_item):
Use const_tree.
(exposure_through_uninit_copy::complain_about_partially_uninit_item):
Likewise.
* region-model.h (region_model_context::get_state): New vfunc.
(noop_region_model_context::get_state): New.
(region_model_context_decorator::get_state): New.
* sm-fd.cc (fd_leak::fd_leak): Add "final_state" param and capture
it if present.
(fd_leak::get_final_state): New.
(fd_leak::m_final_state): New.
(fd_state_machine::on_open): Pass nullptr for new "final_state"
param.
(fd_state_machine::on_creat): Likewise.
(fd_state_machine::on_socket): Likewise.
(fd_state_machine::on_accept): Likewise.
(fd_state_machine::on_leak): Add state params and pass new state
as final state to fd_leak ctor.
* sm-file.cc: Include "analyzer/program-state.h".
(file_leak::file_leak): Add "final_state" param and capture it if
present.
(file_leak::get_final_state): New.
(file_leak::m_final_state): New.
(fileptr_state_machine::on_leak): Add state params and pass new
state as final state to fd_leak ctor.
* sm-malloc.cc: Include
"analyzer/ana-state-to-diagnostic-state.h".
(malloc_leak::malloc_leak): Add "final_state" param and use it.
(malloc_leak::get_final_state): New vfunc impl.
(malloc_leak::m_final_state): New field.
(malloc_state_machine::on_leak): Add state params; capture final
state.
(malloc_state_machine::add_state_to_xml): New.
* sm.cc (state_machine::on_leak): Add "old_state" and "new_state"
params.  Use nullptr.
(state_machine::add_state_to_xml): New.
(state_machine::add_global_state_to_xml): New.
* sm.h (class xml_state): New forward decl.
(state_machine::on_leak): Add state params.
(state_machine::add_state_to_xml): New vfunc decl.
(state_machine::add_global_state_to_xml): New vfunc decl.
* store.h (bit_range::operator<): New.
* varargs.cc (va_list_leak::va_list_leak): Add final_state param
and capture it if non-null.
(va_list_leak::get_final_state): New.
(va_list_leak::m_final_state): New.
(va_list_state_machine::on_leak): Add state params and pass final
state to va_list_leak ctor.

gcc/testsuite/ChangeLog:
PR other/116792
* g++.dg/analyzer/state-diagram.C: New test.
* gcc.dg/analyzer/analyzer-decls.h (__analyzer_dump_dot): New
decl.
(__analyzer_dump_xml): New decl.
* gcc.dg/analyzer/state-diagram-1-sarif.py: New test script.
* gcc.dg/analyzer/state-diagram-1.c: New test.
* gcc.dg/analyzer/state-diagram-2.c: New test.
* gcc.dg/analyzer/state-diagram-3.c: New test.
* gcc.dg/analyzer/state-diagram-4.c: New test.
* gcc.dg/analyzer/state-diagram-5-html.py: New test script.
* gcc.dg/analyzer/state-diagram-5-sarif.py: New test script.
* gcc.dg/analyzer/state-diagram-5.c: New test.
* gcc.dg/plugin/analyzer_cpython_plugin.cc: Define INCLUDE_STRING.
* gcc.dg/plugin/analyzer_gil_plugin.cc: Likewise.
* gcc.dg/plugin/analyzer_kernel_plugin.cc: Likewise.
* gcc.dg/plugin/analyzer_known_fns_plugin.cc: Likewise.
* lib/htmltest.py (ns): Add SVG namespace.
* lib/sarif.py (get_result_by_index): New.
(get_xml_state): New.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
11 days agodiagnostics: handle pp_token::kind::event_id in experimental-html sink [PR116792]
David Malcolm [Mon, 23 Jun 2025 15:06:33 +0000 (11:06 -0400)] 
diagnostics: handle pp_token::kind::event_id in experimental-html sink [PR116792]

gcc/ChangeLog:
PR other/116792
* diagnostic-format-html.cc (html_token_printer::print_tokens):
Handle pp_token::kind::event_id.
(selftest::test_token_printer): Add coverage of printing an event
id.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
11 days agoRISC-V: Add test for vec_duplicate + vsaddu.vv combine case 1 with GR2VR cost 0,...
Pan Li [Sat, 21 Jun 2025 02:07:38 +0000 (10:07 +0800)] 
RISC-V: Add test for vec_duplicate + vsaddu.vv combine case 1 with GR2VR cost 0, 1 and 2

Add asm dump check test for vec_duplicate + vsaddu.vv combine to
vsaddu.vx, with the GR2VR cost is 0, 1 and 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check
for vsaddu.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>
11 days agoRISC-V: Add test for vec_duplicate + vsaddu.vv combine case 0 with GR2VR cost 0,...
Pan Li [Sat, 21 Jun 2025 01:10:07 +0000 (09:10 +0800)] 
RISC-V: Add test for vec_duplicate + vsaddu.vv combine case 0 with GR2VR cost 0, 2 and 15

Add asm dump check and run test for vec_duplicate + vsaddu.vv
combine to vsaddu.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
11 days agoRISC-V: Combine vec_duplicate + vsaddu.vv to vsaddu.vx on GR2VR cost
Pan Li [Sat, 21 Jun 2025 01:00:16 +0000 (09:00 +0800)] 
RISC-V: Combine vec_duplicate + vsaddu.vv to vsaddu.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vsaddu.vv to the
vsaddu.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, FUNC)                                      \
  void                                                                \
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {                                                                   \
    for (unsigned i = 0; i < n; i++)                                  \
      out[i] = FUNC (in[i], x);                                       \
  }

  T sat_add(T a, T b)
  {
    return (a + b) | (-(T)((T)(a + b) < a));
  }

  DEF_VX_BINARY(uint32_t, sat_add)

Before this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma
  13   │     vmv.v.x v2,a2
  14   │     slli    a3,a3,32
  15   │     srli    a3,a3,32
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  18   │     vle32.v v1,0(a1)
  19   │     slli    a4,a5,2
  20   │     sub a3,a3,a5
  21   │     add a1,a1,a4
  22   │     vsaddu.vv v1,v1,v2
  23   │     vse32.v v1,0(a0)
  24   │     add a0,a0,a4
  25   │     bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     slli    a3,a3,32
  13   │     srli    a3,a3,32
  14   │ .L3:
  15   │     vsetvli a5,a3,e32,m1,ta,ma
  16   │     vle32.v v1,0(a1)
  17   │     slli    a4,a5,2
  18   │     sub a3,a3,a5
  19   │     add a1,a1,a4
  20   │     vsaddu.vx v1,v1,a2
  21   │     vse32.v v1,0(a0)
  22   │     add a0,a0,a4
  23   │     bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add
new case US_PLUS.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op us_plus.

Signed-off-by: Pan Li <pan2.li@intel.com>
11 days agotailc: Allow musttail tail calls with -fsanitize=address [PR120608]
Jakub Jelinek [Mon, 23 Jun 2025 14:08:34 +0000 (16:08 +0200)] 
tailc: Allow musttail tail calls with -fsanitize=address [PR120608]

These testcases show another problem with -fsanitize=address
vs. musttail tail calls.  In particular, there can be
  .ASAN_MARK (POISON, &a, 4);
etc. calls after a tail call and those just prevent the tailc pass
to mark the musttail calls as [tail call].
Normally, the sanopt pass (which comes after tailc) will optimize those
away, the optimization is if there are no .ASAN_CHECK calls or normal
function calls dominated by those .ASAN_MARK (POSION, ...) calls, the
poison is not needed, because in the epilog sequence (the one dealt with
in the patch posted earlier today) all the stack slots are unpoisoned anyway
(or poisoned for use-after-return).
Unlike __builtin_tsan_exit_function, .ASAN_MARK is not a real function
and is always expanded inline, so can be never tail called successfully,
so the patch just ignores those for the cfun->has_musttail && diag_musttail
cases.  If there is a non-musttail call, it will fail worst case during
expansion because there is the epilog asan sequence.

2025-06-12  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/120608
* tree-tailcall.cc (empty_eh_cleanup): Ignore .ASAN_MARK (POISON)
internal calls for the cfun->has_musttail case and diag_musttail.
(find_tail_calls): Likewise.

* c-c++-common/asan/pr120608-1.c: New test.
* c-c++-common/asan/pr120608-2.c: New test.

11 days agoexpand: Allow musttail tail calls with -fsanitize=address [PR120608]
Jakub Jelinek [Mon, 23 Jun 2025 13:58:55 +0000 (15:58 +0200)] 
expand: Allow musttail tail calls with -fsanitize=address [PR120608]

The following testcase is rejected by GCC 15 but accepted (with
s/gnu/clang/) by clang.
The problem is that we want to execute a sequence of instructions to
unpoison all automatic variables in the function and mark the var block
allocated for use-after-return sanitization poisoned after the call,
so we were just disabling tail calls if there are any instructions
returned from asan_emit_stack_protection.
It is fine and necessary for normal tail calls, but for musttail
tail calls we actually document that accessing the automatic vars of
the caller is UB as if they end their lifetime right before the tail
call, so we also want address sanitizer user-after-return to diagnose
that.

The following patch will only disable normal tail calls when that sequence
is present, for musttail it will arrange to emit a copy of that sequence
before the tail call sequence.  That sequence only tweaks the shadow memory
and nothing in the code emitted by call expansion should touch the shadow
memory, so it is ok to emit it already before argument setup.

2025-06-23  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/120608
* cfgexpand.cc: Include rtl-iter.h.
(expand_gimple_tailcall): Add ASAN_EPILOG_SEQ argument, if non-NULL
and expand_gimple_stmt emitted a tail call, emit a copy of that
insn sequence before the call sequence.
(expand_gimple_basic_block): Remove DISABLE_TAIL_CALLS argument, add
ASAN_EPILOG_SEQ argument.  Disable tail call flag only on non-musttail
calls if that flag is set, pass it to expand_gimple_tailcall.
(pass_expand::execute): Pass VAR_RET_SEQ directly as last
expand_gimple_basic_block argument rather than its comparison with
NULL.

* g++.dg/asan/pr120608.C: New test.

12 days agovect: Use combined peeling and versioning for mutually aligned DRs
Pengfei Li [Wed, 11 Jun 2025 15:01:36 +0000 (15:01 +0000)] 
vect: Use combined peeling and versioning for mutually aligned DRs

Current GCC uses either peeling or versioning, but not in combination,
to handle unaligned data references (DRs) during vectorization. This
limitation causes some loops with early break to fall back to scalar
code at runtime.

Consider the following loop with DRs in its early break condition:

for (int i = start; i < end; i++) {
  if (a[i] == b[i])
    break;
  count++;
}

In the loop, references to a[] and b[] need to be strictly aligned for
vectorization because speculative reads that may cross page boundaries
are not allowed. Current GCC does versioning for this loop by creating a
runtime check like:

((&a[start] | &b[start]) & mask) == 0

to see if two initial addresses both have lower bits zeros. If above
runtime check fails, the loop will fall back to scalar code. However,
it's often possible that DRs are all unaligned at the beginning but they
become all aligned after a few loop iterations. We call this situation
DRs being "mutually aligned".

This patch enables combined peeling and versioning to avoid loops with
mutually aligned DRs falling back to scalar code. Specifically, the
function vect_peeling_supportable is updated in this patch to return a
three-state enum indicating how peeling can make all unsupportable DRs
aligned. In addition to previous true/false return values, a new state
peeling_maybe_supported is used to indicate that peeling may be able to
make these DRs aligned but we are not sure about it at compile time. In
this case, peeling should be combined with versioning so that a runtime
check will be generated to guard the peeled vectorized loop.

A new type of runtime check is also introduced for combined peeling and
versioning. It's enabled when LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT is true.
The new check tests if all DRs recorded in LOOP_VINFO_MAY_MISALIGN_STMTS
have the same lower address bits. For above loop case, the new test will
generate an XOR between two addresses, like:

((&a[start] ^ &b[start]) & mask) == 0

Therefore, if a and b have the same alignment step (element size) and
the same offset from an alignment boundary, a peeled vectorized loop
will run. This new runtime check also works for >2 DRs, with the LHS
expression being:

((a1 ^ a2) | (a2 ^ a3) | (a3 ^ a4) | ... | (an-1 ^ an)) & mask

where ai is the address of i'th DR.

This patch is bootstrapped and regression tested on x86_64-linux-gnu,
arm-linux-gnueabihf and aarch64-linux-gnu.

gcc/ChangeLog:

* tree-vect-data-refs.cc (vect_peeling_supportable): Return new
enum values to indicate if combined peeling and versioning can
potentially support vectorization.
(vect_enhance_data_refs_alignment): Support combined peeling and
versioning in vectorization analysis.
* tree-vect-loop-manip.cc (vect_create_cond_for_align_checks):
Add a new type of runtime check for mutually aligned DRs.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Set
default value of allow_mutual_alignment in the initializer list.
* tree-vectorizer.h (enum peeling_support): Define type of
peeling support for function vect_peeling_supportable.
(LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT): New access macro.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-early-break_133_pfa6.c: Adjust test.

12 days agomatch: Simplify doubled not, negate and conjugate operators to a non-lvalue
Mikael Morin [Sat, 21 Jun 2025 18:12:31 +0000 (20:12 +0200)] 
match: Simplify doubled not, negate and conjugate operators to a non-lvalue

gcc/ChangeLog:

* match.pd (`-(-X)`, `~(~X)`, `conj(conj(X))`): Add a
NON_LVALUE_EXPR wrapper to the simplification of doubled unary
operators NEGATE_EXPR, BIT_NOT_EXPR and CONJ_EXPR.

gcc/testsuite/ChangeLog:

* gfortran.dg/non_lvalue_1.f90: New test.

12 days agotree-optimization/120729 - limit compile time in uninit_analysis::prune_phi_opnds
Richard Biener [Fri, 20 Jun 2025 13:07:20 +0000 (15:07 +0200)] 
tree-optimization/120729 - limit compile time in uninit_analysis::prune_phi_opnds

The testcase in this PR shows, on the GCC 14 branch, that in some
degenerate cases we can spend exponential time pruning always
initialized paths through a web of PHIs.  The following adds
--param uninit-max-prune-work, defaulted to 100000, to limit that
to effectively O(1).

PR tree-optimization/120729
* gimple-predicate-analysis.h (uninit_analysis::prune_phi_opnds):
Add argument of work budget remaining.
* gimple-predicate-analysis.cc (uninit_analysis::prune_phi_opnds):
Likewise.  Maintain and honor it throughout the recursion.
* params.opt (uninit-max-prune-work): New.
* doc/invoke.texi (uninit-max-prune-work): Document.

12 days agovregs: Use force_subreg when instantiating subregs [PR120721]
Richard Sandiford [Mon, 23 Jun 2025 07:46:27 +0000 (08:46 +0100)] 
vregs: Use force_subreg when instantiating subregs [PR120721]

In this PR, we started with:

    (subreg:V2DI (reg:DI virtual-reg) 0)

and vregs instantiated the virtual register to the argument pointer.
But:

    (subreg:V2DI (reg:DI ap) 0)

is not a sensible subreg, since the argument pointer certainly can't
be referenced in V2DImode.  This is (IMO correctly) rejected after
g:2dcc6dbd8a00caf7cfa8cac17b3fd1c33d658016.

The vregs code that instantiates the subreg above is specific to
rvalues and already creates new instructions for nonzero offsets.
It is therefore safe to use force_subreg instead of simplify_gen_subreg.

I did wonder whether we should instead say that a subreg of a
virtual register is invalid if the same subreg would be invalid
for the associated hard registers.  But the point of virtual registers
is that the offsets from the hard registers are not known until after
expand has finished, and if an offset is nonzero, the virtual register
will be instantiated into a pseudo that contains the sum of the hard
register and the offset.  The subreg would then be correct for that
pseudo.  The subreg is only invalid in this case because there is
no offset.

gcc/
PR rtl-optimization/120721
* function.cc (instantiate_virtual_regs_in_insn): Use force_subreg
instead of simplify_gen_subreg when instantiating an rvalue SUBREG.

gcc/testsuite/
PR rtl-optimization/120721
* g++.dg/torture/pr120721.C: New test.

12 days agox86: Don't use vmovdqu16/vmovdqu8 with non-EVEX registers
H.J. Lu [Fri, 20 Jun 2025 08:07:18 +0000 (16:07 +0800)] 
x86: Don't use vmovdqu16/vmovdqu8 with non-EVEX registers

Don't use vmovdqu16/vmovdqu8 with non-EVEX register operands just because
AVX512BW is available.

gcc/

PR target/120728
* config/i386/i386.cc (ix86_get_ssemov): Use vmovdqu16/vmovdqu8
only with EVEX register operands.

gcc/testsuite/

PR target/120728
* gcc.target/i386/avx512bw-vmovdqu16-1.c: Scan vmovdqu for
non-EVEX register operands.
* gcc.target/i386/avx512bw-vmovdqu8-1.c: Likewise.
* gcc.target/i386/avx512fp16-13.c: Likewise.
* gcc.target/i386/pr100865-10b.c: Likewise.
* gcc.target/i386/pr100865-3.c: Likewise.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-5b.c: Likewise.
* gcc.target/i386/pr90773-15.c: Likewise.
* gcc.target/i386/pr90773-16.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
* gcc.target/i386/pr95483-5.c: Likewise.
* gcc.target/i386/pr120728.c: New test.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
12 days agox86: Add PROCESSOR_XXX comments to processor_cost_table
H.J. Lu [Mon, 23 Jun 2025 02:55:49 +0000 (10:55 +0800)] 
x86: Add PROCESSOR_XXX comments to processor_cost_table

Add a PROCESSOR_XXX comment to each entry in processor_cost_table to
describe which processor the cost enry is applied to.

* config/i386/i386-options.cc (processor_cost_table): Add a
PROCESSOR_XXX comment to each entry.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
12 days agoDaily bump.
GCC Administrator [Mon, 23 Jun 2025 00:16:33 +0000 (00:16 +0000)] 
Daily bump.

12 days agoAda: Replace hardcoded GNAT commands for GNAT tools
Nicolas Boulenguez [Sun, 22 Jun 2025 22:37:35 +0000 (00:37 +0200)] 
Ada: Replace hardcoded GNAT commands for GNAT tools

This replaces the hardcoded gnat{make,link,bind,ls} commands with expansion
of the GNAT{MAKE,BIND} variables computed by the configure machinery, during
the build of the GNAT tools.

The default GNATMAKE_FOR_HOST duplicates the default GNATMAKE, and someone
setting GNATMAKE in the toplevel configuration may want it applied for all
host compilations.  Direct assignment of GNATMAKE_FOR_HOST keeps working.

gcc/ada/
PR ada/120106
* gcc-interface/Make-lang.in: Set GNAT{MAKE,BIND,LINK_LS}_FOR_HOST
from GNAT{MAKE,BIND} instead of using hardcoded commands.
gnattools/
PR ada/120106
* configure.ac: Remove ACX_NONCANONICAL_HOST and add ACX_PROG_GNAT.
* configure: Regenerate.
* Makefile.in: Do not substitute host_noncanonical but substitute
GNATMAKE and GNATBIND.
Set GNAT{MAKE,BIND,LINK_LS}_FOR_HOST from GNAT{MAKE,BIND} instead
of using hardcoded commands.

12 days agoAda: Remove obsolete stuff in Makefile fragment
Eric Botcazou [Sun, 22 Jun 2025 18:35:36 +0000 (20:35 +0200)] 
Ada: Remove obsolete stuff in Makefile fragment

gcc/ada/
* Make-generated.in: Remove obsolete stuff.

12 days agoAda: Introduce GNATMAKE_FOR_BUILD Makefile variable
Nicolas Boulenguez [Sun, 22 Jun 2025 17:23:11 +0000 (19:23 +0200)] 
Ada: Introduce GNATMAKE_FOR_BUILD Makefile variable

This gets rid of the hardcoded 'gnatmake' command used during the build.

/
PR ada/120106
* Makefile.tpl: Add GNATMAKE_FOR_BUILD to {HOST,BASE_TARGET}_EXPORTS
* Makefile.in: Regenerate.
* configure.ac: Set the default and substitute the variable.
* configure: Regenerate.
gcc/ada/
PR ada/120106
* Make-generated.in: Use GNATMAKE_FOR_BUILD instead of gnatmake.
* gcc-interface/Makefile.in: Likewise.

12 days ago[RISC-V][PR target/119830] Fix RISC-V codegen on 32bit hosts
Andrew Pinski [Sun, 22 Jun 2025 18:35:19 +0000 (12:35 -0600)] 
[RISC-V][PR target/119830] Fix RISC-V codegen on 32bit hosts

So this is Andrew's patch from the PR.  We weren't clean for a 32bit host in
some of the arithmetic for constant synthesis.

I confirmed the bug on a 32bit linux host, then confirmed that Andrew's patch
from the PR fixes the problem, then ran Andrew's patch through my tester
successfully.

Naturally I'll wait for pre-commit testing, but I'm not expecting problems.

PR target/119830
gcc/
* config/riscv/riscv.cc (riscv_build_integer_1): Make arithmetic in bclr case
clean for 32 bit hosts.

gcc/testsuite/
* gcc.target/riscv/pr119830.c: New test.

12 days ago[committed][PR rtl-optimization/120550] Drop REG_EQUAL note after ext-dce transformation
Jeff Law [Sun, 22 Jun 2025 18:06:08 +0000 (12:06 -0600)] 
[committed][PR rtl-optimization/120550] Drop REG_EQUAL note after ext-dce transformation

This bug was found by Edwin's fuzzing efforts on RISC-V, though it likely
affects other targets.

In simplest terms when ext-dce converts an extension into a (possibly
simplified) subreg copy it may make an attached REG_EQUAL note invalid.

In the case Edwin found the note was an extension, but I don't think that would
necessarily always be the case.  The note could have other forms which
potentially need invalidation.  So the safest thing to do is just remove any
attached REG_EQUAL or REG_EQUIV note.

Note adjusting Edwin's testcase in the obvious way to avoid having to interpret
printf output for pass/fail status makes the bug go latent.  That's why no
testcase is included with this patch.

Bootstrapped and regression tested on x86_64.  Obviously also verified it fixes
the testcase Edwin filed.

This is a good candidate for cherry-picking to the gcc-15 release branch after
simmering on the trunk a bit.

PR rtl-optimization/120550
gcc/
* ext-dce.cc (ext_dce_try_optimize_insn): Drop REG_EQUAL/REG_EQUIV
notes on modified insns.

13 days agoxtensa: Make use of DEPBITS instruction
Takayuki 'January June' Suwa [Tue, 17 Jun 2025 06:56:52 +0000 (15:56 +0900)] 
xtensa: Make use of DEPBITS instruction

This patch implements bitfield insertion MD pattern using the DEPBITS
machine instruction, the counterpart of the EXTUI instruction, if
available.

     /* example */
     struct foo {
       unsigned int b:10;
       unsigned int r:11;
       unsigned int g:11;
     };
     void test(struct foo *p) {
       p->g >>= 1;
     }

     ;; result (endianness: little)
     test:
      entry sp, 32
      l32i.n a8, a2, 0
      extui a9, a8, 1, 10
      depbits a8, a9, 0, 11
      s32i.n a8, a2, 0
      retw.n

gcc/ChangeLog:

* config/xtensa/xtensa.h (TARGET_DEPBITS): New macro.
* config/xtensa/xtensa.md (insvsi): New insn pattern.

13 days agoxtensa: Implement TARGET_ZERO_CALL_USED_REGS
Takayuki 'January June' Suwa [Mon, 16 Jun 2025 13:33:36 +0000 (22:33 +0900)] 
xtensa: Implement TARGET_ZERO_CALL_USED_REGS

This patch implements the target-specific ZERO_CALL_USED_REGS hook, since
if -fzero-call-used-regs=all the default hook tries to assign 0 to B0
(bit 0 of the BR register) and the ICE will be thrown.

gcc/ChangeLog:

* config/xtensa/xtensa.cc (xtensa_zero_call_used_regs):
New prototype and function.
(TARGET_ZERO_CALL_USED_REGS): Define macro.

13 days agoFix some problems with afdo propagation
Jan Hubicka [Sun, 22 Jun 2025 09:06:12 +0000 (11:06 +0200)] 
Fix some problems with afdo propagation

This patch fixes problems I noticed by exploring profiles of some hot
functions in GCC.  In particular the propagation sometimes changed
precise 0 to afdo 0 for paths calling abort and sometimes we could
propagate more when we accept that some paths has 0 count.
Finally there was important bug in computing all_known which
resulted in BB probabilities to be quite broken after afdo.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* auto-profile.cc (update_count_by_afdo_count): Make static;
add variant accepting profile_count.
(afdo_find_equiv_class): Use update_count_by_afdo_count.
(afdo_propagate_edge): Likewise.
(afdo_propagate): Likewise.
(afdo_calculate_branch_prob): Fix handling of all_known.
(afdo_annotate_cfg): Annotate by 0 where both afdo and static
profile agrees.

13 days agoHandle functions with 0 profile in auto-profile
Jan Hubicka [Sun, 22 Jun 2025 04:55:41 +0000 (06:55 +0200)] 
Handle functions with 0 profile in auto-profile

This is the last part of the infrastructure to allow functions with
local profiles and 0 global autofdo counts.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* auto-profile.cc (afdo_set_bb_count): Dump inline stacks
and reasons when lookup failed.
(afdo_set_bb_count): Record info about BBs with zero AFDO count.
(afdo_annotate_cfg): Set profile to global0_afdo if there are
no samples in profile.

13 days ago[PR modula2/120731] error in Strings.Pos causing sigsegv
Gaius Mulley [Sun, 22 Jun 2025 03:13:26 +0000 (04:13 +0100)] 
[PR modula2/120731] error in Strings.Pos causing sigsegv

This patch corrects the m2log library procedure function
Strings.Pos which incorrectly sliced the wrong component
of the source string.  The incorrect slice could cause
a sigsegv if negative slice indices were generated.

gcc/m2/ChangeLog:

PR modula2/120731
* gm2-libs-log/Strings.def (Delete): Rewrite comment.
* gm2-libs-log/Strings.mod (Pos): Rewrite.
(PosLower): New procedure function.

gcc/testsuite/ChangeLog:

PR modula2/120731
* gm2/pimlib/logitech/run/pass/teststrings.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
13 days agoPrevent possible overflows in ipa-profile
Jan Hubicka [Sun, 22 Jun 2025 01:32:29 +0000 (03:32 +0200)] 
Prevent possible overflows in ipa-profile

The bug in scaling profile of fnsplit produced clones made
some afdo counts during gcc bootstrap very large (2^59).
This made computations in ipa-profile to silently overflow
which triggered hot count to be identified as 1 instead of
sane value.

While fixing the fnsplit bug prevents overflow, I think the histogram code
should be made safe too.  sreal is not very fitting here since mantisa is 32bit
and not very good for many additions of many numbers which are possibly of very
different order.  So I use widest_int while 128bit arithmetics would be safe
(we are summing 60 bit counts multiplied by time estimates).  I don't think
we have readily available 128bit type and code is not really time critical since
the histogram is computed once.

gcc/ChangeLog:

* ipa-profile.cc (ipa_profile): Use widest_int to avoid
possible overflows.

13 days agoScale up auto-profile counts
Jan Hubicka [Sun, 22 Jun 2025 01:26:36 +0000 (03:26 +0200)] 
Scale up auto-profile counts

This patch makes auto-profile counts to scale up when the train run has
too small maximal count.  This still happens on moderately sized train runs
such as SPEC2017 ref datasets and helps to avoid rounding errors in cases
where number of samples is small.

gcc/ChangeLog:

* auto-profile.cc (autofdo::afdo_count_scale): New.
(autofdo_source_profile::update_inlined_ind_target): Scale
counts.
(autofdo_source_profile::read): Set scale and dump
statistics.
(afdo_indirect_call): Scale.
(afdo_set_bb_count): Scale.
(afdo_find_equiv_class): Fix dumps.
(afdo_annotate_cfg): Scale.

13 days agoAdd GUESSED_GLOBAL0_AFDO
Jan Hubicka [Sun, 22 Jun 2025 01:12:55 +0000 (03:12 +0200)] 
Add GUESSED_GLOBAL0_AFDO

This patch adds GUESSED_GLOBAL0_AFDO profile quality. It can
be used to preserve local counts of functions which have 0 AFDO
profile.

I originally did not include it as it was not clear it will be useful and
it turns quality from 3bits to 4bits which means that we need to steal another
bit from the actual counters.

It is likely not a problem for profile_count since counting up to 2^60 still
takes a while.  However with profile_probability I run into a problem that
gimple FE testcases encodes probability with current representation and thus
changing profile_probability::n_bits would require updating all testcases.

Since probabilities never use GLOBAL0 qualities (they are always local)
addoing bits is not necessary, but it requires encoding quality in
the data type that required adding accessors.

While working on this I also noticed that GUESSED_GLOBAL0 and
GUESSED_GLOBAL0_ADJUSED are misordered. Qualities should be in
increasing order.  This is also fixed.

auto-profile will be updated later.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

gcc/ChangeLog:

* cgraph.cc (cgraph_node::make_profile_global0): Support
GUESSED_GLOBAL0_AFDO
* ipa-cp.cc (update_profiling_info): Use
GUESSED_GLOBAL0_AFDO.
* profile-count.cc (profile_probability::dump): Use
quality ().
(profile_probability::stream_in): Use m_adjusted_quality.
(profile_probability::stream_out): Use m_adjusted_quality.
(profile_count::combine_with_ipa_count): Use quality ().
(profile_probability::sqrt): Likewise.
* profile-count.h (enum profile_quality): Add
GUESSED_GLOBAL0_AFDO; reoder GUESSED_GLOBAL0_ADJUSTED and
GUESSED_GLOBAL0.
(profile_probability): Add min_quality; replase m_quality
by m_adjused_quality; add set_quality; update all users
of quality.
(profile_count): Set n_bits to 60; make m_quality 4 bits;
update uses of quality.
(profile_count::afdo_zero, profile_count::globa0afdo): New.

13 days agoDaily bump.
GCC Administrator [Sun, 22 Jun 2025 00:18:00 +0000 (00:18 +0000)] 
Daily bump.

13 days agoFix profile after fnsplit
Jan Hubicka [Sat, 21 Jun 2025 20:29:50 +0000 (22:29 +0200)] 
Fix profile after fnsplit

when splitting functions, tree-inline determined correctly entry count of the
new function part, but then in case entry block of new function part is in a
loop it scales body which is not suposed to happen.

* tree-inline.cc (copy_cfg_body): Fix profile of split functions.

13 days ago[modula2] Comment tidyup in gm2-compiler/M2GCCDeclare.mod
Gaius Mulley [Sat, 21 Jun 2025 16:18:04 +0000 (17:18 +0100)] 
[modula2] Comment tidyup in gm2-compiler/M2GCCDeclare.mod

This patch reformats three comments in the GNU GCC style.

gcc/m2/ChangeLog:

* gm2-compiler/M2GCCDeclare.mod (StartDeclareModuleScopeSeparate):
Reformat statement comments.
(StartDeclareModuleScopeWholeProgram): Ditto.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
13 days ago[RISC-V][PR target/118241] Fix data prefetch predicate/constraint for RISC-V
Jeff Law [Sat, 21 Jun 2025 14:24:58 +0000 (08:24 -0600)] 
[RISC-V][PR target/118241] Fix data prefetch predicate/constraint for RISC-V

The RISC-V prefetch support is broken in a few ways.  This addresses the data
side prefetch problems.  I'd mistakenly thought this BZ was a prefetch.i
related (which has deeper problems).

The basic problem is we were accepting any valid address when in fact there are
restrictions.  This patch more precisely defines the predicate such that we
allow

REG
REG+D

Where D must have the low 5 bits clear.  Note that absolute addresses fall into
the REG+D form using the x0 for the register operand since it always has the
value zero.  The test verifies REG, REG+D, ABS addressing modes that are valid
as well as REG+D and ABS which must be reloaded into a REG because the
displacement has low bits set.

An earlier version of this patch has gone through testing in my tester on rv32
and rv64.  Obviously I'll wait for pre-commit CI to do its thing before moving
forward.

This is a good backport candidate after simmering on the trunk for a bit.

PR target/118241
gcc/
* config/riscv/predicates.md (prefetch_operand): New predicate.
* config/riscv/constraints.md (Q): New constraint.
* config/riscv/riscv.md (prefetch): Use new predicate and constraint.
(riscv_prefetchi_<mode>): Similarly.

gcc/testsuite/
* gcc.target/riscv/pr118241.c: New test.

13 days agovalue-range: Use int instead of uint for wi::ctz result [PR120746]
Jakub Jelinek [Sat, 21 Jun 2025 14:09:08 +0000 (16:09 +0200)] 
value-range: Use int instead of uint for wi::ctz result [PR120746]

uint is some compatibility type in glibc sys/types.h enabled in misc/GNU
modes, so it doesn't exist on many hosts.
Furthermore, wi::ctz returns int rather than unsigned and the var is
only used in comparison to zero or as second argument of left shift, so
I think just using int instead of unsigned is better.

2025-06-21  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/120746
* value-range.cc (irange::snap): Use int type instead of uint.

2 weeks agoExtend afdo inliner to introduce speculative calls
Jan Hubicka [Sat, 21 Jun 2025 03:37:24 +0000 (05:37 +0200)] 
Extend afdo inliner to introduce speculative calls

This patch makes the AFDO's VPT to happen during early inlining.  This should
make the einline pass inside afdo pass unnecesary, but some inlining still
happens there - I will need to debug why that happens and will try to drop the
afdo's inliner incrementally.

get_inline_stack_in_node can now be used to produce inline stack out of
callgraph nodes which are marked as inline clones, so we do not need to iterate
tree-inline and IPA decisions phases like old code did.   I also added some
debug facilities - dumping of decisions and inline stacks, so one can match
them with data in gcov profile.

Former VPT pass identified all caes where in train run indirect call was inlined
and the inlined callee collected some samples. In this case it forced inline without
doing any checks, such as whether inlining is possible.

New code simply introduces speculative edges into callgraph and lets afdo inlining
to decide.  Old code also marked statements that were introduced during promotion
to prevent doing double speculation i.e.

   if (ptr == foo)
      .. inlined foo ...
   else
      ptr ();

to

   if (ptr == foo)
      .. inlined foo ...
   else if (ptr == foo)
      foo (); // for IPA inlining
   else
      ptr ();

Since inlning now happens much earlier, tracking the statements would be quite hard.
Instead I simply remove the targets from profile data which sould have same effect.

I also noticed that there is nothing setting max_count so all non-0 profile is
considered hot which I fixed too.

Training with ref run I now get
500.perlbench_r       1    160           9.93  *       1    162           9.84  *
502.gcc_r                                     NR                               NR
505.mcf_r             1    186           8.68  *       1    194           8.34  *
520.omnetpp_r         1    183           7.15  *       1    208           6.32  *
523.xalancbmk_r                               NR                               NR
525.x264_r            1     85.2        20.5   *       1     85.8        20.4   *
531.deepsjeng_r       1    165           6.93  *       1    176           6.51  *
541.leela_r           1    268           6.18  *       1    282           5.87  *
548.exchange2_r       1     86.3        30.4   *       1     88.9        29.5   *
557.xz_r              1    224           4.81  *       1    224           4.82  *
 Est. SPECrate2017_int_base              9.72
 Est. SPECrate2017_int_peak                                               9.33

503.bwaves_r                                  NR                               NR
507.cactuBSSN_r       1    107          11.9   *       1      105        12.0   *
508.namd_r            1    108           8.79  *       1      116         8.18  *
510.parest_r          1    143          18.3   *       1      156        16.8   *
511.povray_r          1    188          12.4   *       1      163        14.4   *
519.lbm_r             1     72.0        14.6   *       1       75.0      14.1   *
521.wrf_r             1    106          21.1   *       1      106        21.1   *
526.blender_r         1    147          10.3   *       1      147        10.4   *
527.cam4_r            1    110          15.9   *       1      118        14.8   *
538.imagick_r         1    104          23.8   *       1      105        23.7   *
544.nab_r             1    146          11.6   *       1      143        11.8   *
549.fotonik3d_r       1    134          29.0   *       1      169        23.1   *
554.roms_r            1     86.6        18.4   *       1       89.3      17.8   *
 Est. SPECrate2017_fp_base               15.4
 Est. SPECrate2017_fp_peak                                                14.9

Base is without profile feedback and peak is AFDO.

gcc/ChangeLog:

* auto-profile.cc (dump_inline_stack): New function.
(get_inline_stack_in_node): New function.
(get_relative_location_for_stmt): Add FN parameter.
(has_indirect_call): Remove.
(function_instance::find_icall_target_map): Add FN parameter.
(function_instance::remove_icall_target): New function.
(function_instance::read_function_instance): Set sum_max.
(autofdo_source_profile::get_count_info): Add NODE parameter.
(autofdo_source_profile::update_inlined_ind_target): Add NODE parameter.
(autofdo_source_profile::remove_icall_target): New function.
(afdo_indirect_call): Add INDIRECT_EDGE parameter; dump reason
for failure; do not check for recursion; do not inline call.
(afdo_vpt): Add INDIRECT_EDGE parameter.
(afdo_set_bb_count): Do not take PROMOTED set.
(afdo_vpt_for_early_inline): Remove.
(afdo_annotate_cfg): Do not take PROMOTED set.
(auto_profile): Do not call afdo_vpt_for_early_inline.
(afdo_callsite_hot_enough_for_early_inline): Dump count.
(remove_afdo_speculative_target): New function.
* auto-profile.h (afdo_vpt_for_early_inline): Declare.
(remove_afdo_speculative_target): Declare.
* ipa-inline.cc (inline_functions_by_afdo): Do VPT.
(early_inliner): Redirecct edges if inlining happened.
* tree-inline.cc (expand_call_inline): Add sanity check.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/afdo-vpt-earlyinline.c: Update template.
* gcc.dg/tree-prof/indir-call-prof-2.c: Update template.

2 weeks agoImplement afdo inliner
Jan Hubicka [Wed, 18 Jun 2025 10:10:25 +0000 (12:10 +0200)] 
Implement afdo inliner

This patch moves afdo inlining from early inliner into specialized one.
The reason is that early inliner is by design non-recursive while afdo
inliner needs to recurse.  In the past google handled it by increasing
early inliner iterations, but it can be done easily and cheaply without
it by simply recusing into inlined functions.

I will also look into moving VPT to early inliner now.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* auto-profile.cc (get_inline_stack): Add fn parameter.
* ipa-inline.cc (want_early_inline_function_p): Do not care
about AFDO.
(inline_functions_by_afdo): New function.
(early_inliner): Use it.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/afdo-vpt-earlyinline.c: Update template.
* gcc.dg/tree-prof/indir-call-prof-2.c: Likewise.
* gcc.dg/tree-prof/afdo-inline.c: New test.

2 weeks agoRISC-V: Fix ICE for expand_select_vldi [PR120652]
Pan Li [Thu, 19 Jun 2025 10:58:17 +0000 (18:58 +0800)] 
RISC-V: Fix ICE for expand_select_vldi [PR120652]

The will be one ICE when expand pass, the bt similar as below.

during RTL pass: expand
red.c: In function 'main':
red.c:20:5: internal compiler error: in require, at machmode.h:323
   20 | int main() {
      |     ^~~~
0x2e0b1d6 internal_error(char const*, ...)
        ../../../gcc/gcc/diagnostic-global-context.cc:517
0xd0d3ed fancy_abort(char const*, int, char const*)
        ../../../gcc/gcc/diagnostic.cc:1803
0xc3da74 opt_mode<machine_mode>::require() const
        ../../../gcc/gcc/machmode.h:323
0xc3de2f opt_mode<machine_mode>::require() const
        ../../../gcc/gcc/poly-int.h:1383
0xc3de2f riscv_vector::expand_select_vl(rtx_def**)
        ../../../gcc/gcc/config/riscv/riscv-v.cc:4218
0x21c7d22 gen_select_vldi(rtx_def*, rtx_def*, rtx_def*)
        ../../../gcc/gcc/config/riscv/autovec.md:1344
0x134db6c maybe_expand_insn(insn_code, unsigned int, expand_operand*)
        ../../../gcc/gcc/optabs.cc:8257
0x134db6c expand_insn(insn_code, unsigned int, expand_operand*)
        ../../../gcc/gcc/optabs.cc:8288
0x11b21d3 expand_fn_using_insn
        ../../../gcc/gcc/internal-fn.cc:318
0xef32cf expand_call_stmt
        ../../../gcc/gcc/cfgexpand.cc:3097
0xef32cf expand_gimple_stmt_1
        ../../../gcc/gcc/cfgexpand.cc:4264
0xef32cf expand_gimple_stmt
        ../../../gcc/gcc/cfgexpand.cc:4411
0xef95b6 expand_gimple_basic_block
        ../../../gcc/gcc/cfgexpand.cc:6472
0xefb66f execute
        ../../../gcc/gcc/cfgexpand.cc:7223

The select_vl op_1 and op_2 may be the same const_int like (const_int 32).
And then maybe_legitimize_operands will:

1. First mov the const op_1 to a reg.
2. Resue the reg of op_1 for op_2 as the op_1 and op_2 is equal.

That will break the assumption that the op_2 of select_vl is immediate,
or something like CONST_INT_POLY.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

PR target/120652

gcc/ChangeLog:

* config/riscv/autovec.md: Add immediate_operand for
select_vl operand 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr120652-1.c: New test.
* gcc.target/riscv/rvv/autovec/pr120652-2.c: New test.
* gcc.target/riscv/rvv/autovec/pr120652-3.c: New test.
* gcc.target/riscv/rvv/autovec/pr120652.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
2 weeks agoDaily bump.
GCC Administrator [Sat, 21 Jun 2025 00:18:50 +0000 (00:18 +0000)] 
Daily bump.

2 weeks agocobol: Correct diagnostic strings for 32-bit builds.
James K. Lowden [Fri, 20 Jun 2025 16:43:51 +0000 (12:43 -0400)] 
cobol: Correct diagnostic strings for 32-bit builds.

Avoid %z for printf-family.  Cast pid_t to long.  Avoid use of YYUNDEF
for old Bison versions.

PR cobol/120621

gcc/cobol/ChangeLog:

* genapi.cc (parser_compile_ecs): Cast argument to unsigned long.
(parser_compile_dcls): Same.
(parser_division): RAII.
(inspect_tally): Cast argument to unsigned long.
* lexio.cc (cdftext::lex_open): Cast pid_t to long.
* parse.y: hard-code values for old versions of Bison, and message format.
* scan_ante.h (wait_for_the_child): Cast pid_t to long.

2 weeks agoFix range wrap check and enhance verify_range.
Andrew MacLeod [Fri, 20 Jun 2025 12:50:39 +0000 (08:50 -0400)] 
Fix range wrap check and enhance verify_range.

when snapping range bounds to satidsdaybitmask constraints, end bound overflow
and underflow checks were not working properly.
Also Adjust some comments, and enhance verify_range to make sure range pairs
are sorted properly.

PR tree-optimization/120701
gcc/
* value-range.cc (irange::verify_range): Verify range pairs are
sorted properly.
(irange::snap): Check for over/underflow properly.

gcc/testsuite/
* gcc.dg/pr120701.c: New.

2 weeks agoamdgcn: allow SImode in VCC_HI [PR120722]
Andrew Stubbs [Fri, 20 Jun 2025 16:43:37 +0000 (16:43 +0000)] 
amdgcn: allow SImode in VCC_HI [PR120722]

This patch isn't fully tested yet, but it fixes the build failure, so that
will do for now.  SImode was not allowed in VCC_HI because there were issues,
way back before the port went upstream, so it's possible we'll find out what
those issues were again soon.

gcc/ChangeLog:

PR target/120722
* config/gcn/gcn.cc (gcn_hard_regno_mode_ok): Allow SImode in VCC_HI.

2 weeks agolibgcobol: Add license.
James K. Lowden [Fri, 20 Jun 2025 14:16:26 +0000 (10:16 -0400)] 
libgcobol: Add license.

libgcobol/ChangeLog:

* LICENSE: New file.

2 weeks agoUse auto_vec in prime paths selftests [PR120634]
Jørgen Kvalsvik [Thu, 19 Jun 2025 19:00:07 +0000 (21:00 +0200)] 
Use auto_vec in prime paths selftests [PR120634]

The selftests had a bunch of memory leaks that showed up in make
selftest-valgrind as a result of not using auto_vec or other
explicitly calling release. Replacing vec with auto_vec makes the
problem go away.  The auto_vec_vec helper is made constructable from a
vec so that objects returned from functions can be automatically
managed too.

PR gcov-profile/120634

gcc/ChangeLog:

* prime-paths.cc (struct auto_vec_vec): Add constructor from
vec.
(test_split_components): Use auto_vec_vec.
(test_scc_internal_prime_paths): Ditto.
(test_scc_entry_exit_paths): Ditto.
(test_complete_prime_paths): Ditto.
(test_entry_prime_paths): Ditto.
(test_singleton_path): Ditto.

2 weeks agoFree buffer on function exit [PR120634]
Jørgen Kvalsvik [Thu, 19 Jun 2025 18:56:30 +0000 (20:56 +0200)] 
Free buffer on function exit [PR120634]

Using auto_vec ensures that the buffer is always free'd when the
function returns.

PR gcov-profile/120634

gcc/ChangeLog:

* prime-paths.cc (trie::paths): Use auto_vec.

2 weeks agotree-optimization/120654 - ICE with range query from IVOPTs
Richard Biener [Fri, 20 Jun 2025 09:14:38 +0000 (11:14 +0200)] 
tree-optimization/120654 - ICE with range query from IVOPTs

The following ICEs as we hand down an UNDEFINED range to where it
isn't expected.  Put the guard that's there earlier.

PR tree-optimization/120654
* vr-values.cc (range_fits_type_p): Check for undefined_p ()
before accessing type ().

* gcc.dg/torture/pr120654.c: New testcase.

2 weeks agox86: Get the widest vector mode from MOVE_MAX
H.J. Lu [Wed, 18 Jun 2025 21:03:48 +0000 (05:03 +0800)] 
x86: Get the widest vector mode from MOVE_MAX

Since MOVE_MAX defines the maximum number of bytes that an instruction
can move quickly between memory and registers, use it to get the widest
vector mode in vector loop when inlining memcpy and memset.

gcc/

PR target/120708
* config/i386/i386-expand.cc (ix86_expand_set_or_cpymem): Use
MOVE_MAX to get the widest vector mode in vector loop.

gcc/testsuite/

PR target/120708
* gcc.target/i386/memcpy-pr120708-1.c: New test.
* gcc.target/i386/memcpy-pr120708-2.c: Likewise.
* gcc.target/i386/memcpy-pr120708-3.c: Likewise.
* gcc.target/i386/memcpy-pr120708-4.c: Likewise.
* gcc.target/i386/memcpy-pr120708-5.c: Likewise.
* gcc.target/i386/memcpy-pr120708-6.c: Likewise.
* gcc.target/i386/memset-pr120708-1.c: Likewise.
* gcc.target/i386/memset-pr120708-2.c: Likewise.
* gcc.target/i386/memcpy-strategy-1.c: Drop dg-skip-if.  Replace
-march=atom with -mno-avx -msse2 -mtune=generic
-mtune-ctrl=^sse_typeless_stores.
* gcc.target/i386/memcpy-strategy-2.c: Likewise.
* gcc.target/i386/memcpy-vector_loop-1.c: Likewise.
* gcc.target/i386/memcpy-vector_loop-2.c: Likewise.
* gcc.target/i386/memset-vector_loop-1.c: Likewise.
* gcc.target/i386/memset-vector_loop-2.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2 weeks agoor1k: Improve If-Conversion by delaying cbranch splits
Stafford Horne [Thu, 19 Jun 2025 11:17:20 +0000 (12:17 +0100)] 
or1k: Improve If-Conversion by delaying cbranch splits

When working on PR120587 I found that the ce1 pass was not able to
properly optimize branches on OpenRISC.  This is because of the early
splitting of "compare" and "branch" instructions during the expand pass.

Convert the cbranch* instructions from define_expand to
define_insn_and_split.  This dalays the instruction split until after
the ce1 pass is done giving ce1 the best opportunity to perform the
optimizations on the original form of cbranch<mode>4 instructions.

gcc/ChangeLog:

* config/or1k/or1k.cc (or1k_noce_conversion_profitable_p): New
function.
(or1k_is_cmov_insn): New function.
(TARGET_NOCE_CONVERSION_PROFITABLE_P): Define macro.
* config/or1k/or1k.md (cbranchsi4): Convert to insn_and_split.
(cbranch<mode>4): Convert to insn_and_split.

Signed-off-by: Stafford Horne <shorne@gmail.com>
2 weeks agoor1k: Implement *extendbisi* to fix ICE in convert_mode_scalar [PR120587]
Stafford Horne [Wed, 18 Jun 2025 20:47:03 +0000 (21:47 +0100)] 
or1k: Implement *extendbisi* to fix ICE in convert_mode_scalar [PR120587]

After commit 2dcc6dbd8a0 ("emit-rtl: Use simplify_subreg_regno to
validate hardware subregs [PR119966]") the OpenRISC port is broken
again.

Add extend* iinstruction patterns for the SR_F pseudo registers to avoid
having to use the subreg conversions which no longer work.

gcc/ChangeLog:

PR target/120587
* config/or1k/or1k.md (zero_extendbisi2_sr_f): New expand.
(extendbisi2_sr_f): New expand.
* config/or1k/predicates.md (sr_f_reg_operand): New predicate.

Signed-off-by: Stafford Horne <shorne@gmail.com>
2 weeks ago[RISC-V] Force several tests to use rocket tuning
Jeff Law [Fri, 20 Jun 2025 02:58:56 +0000 (20:58 -0600)] 
[RISC-V] Force several tests to use rocket tuning

My tester has been flagging these regressions since the default cost model was
committed, along with several others

> unix/-march=rv64gc_zba_zbb_zbs_zicond: gcc: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O2   scan-assembler-times \\.L[0-9]+\\:\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+add\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[a-x0-9]+\\s+\\.L[0-9]+\\: 1
> unix/-march=rv64gc_zba_zbb_zbs_zicond: gcc: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times \\.L[0-9]+\\:\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+add\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[a-x0-9]+\\s+\\.L[0-9]+\\: 1
> unix/-march=rv64gc_zba_zbb_zbs_zicond: gcc: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times \\.L[0-9]+\\:\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+add\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[a-x0-9]+\\s+\\.L[0-9]+\\: 1

I really question the value of checking the output that precisely in these
tests -- they're supposed to be checking vsetvl correctness and optimization,
so the ordering and such of scalar ops shouldn't really be important at all.

Regardless, since I don't know these tests at all I resisted the temptation to
rip out the undesirable aspects of the test.

Next up, fix the bogus scan or force the old cost model (rocket).  I choose the
latter as a path of least resistance and least surprise.

Waiting for pre-commit CI to spin.

gcc/testsuite
* gcc.target/riscv/rvv/vsetvl/avl_single-37.c: Force rocket tuning.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-17.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-18.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-19.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-20.c: Likewise.

2 weeks ago[PATCH] RISC-V: Use builtin clz/ctz when count_leading_zeros and count_trailing_zeros...
Sosutha Sethuramapandian [Fri, 20 Jun 2025 02:53:56 +0000 (20:53 -0600)] 
[PATCH] RISC-V: Use builtin clz/ctz when count_leading_zeros and count_trailing_zeros is used

longlong.h for RISCV should define count_leading_zeros and
count_trailing_zeros and COUNT_LEADING_ZEROS_0 when ZBB is enabled.

The following patch patch fixes the bug reported in,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110181

The divdi3 on riscv32 with zbb extension generates __clz_tab
instead of genearating  __builtin_clzll/__builtin_clz which is
not efficient since lookup table is emitted.

Updating longlong.h to use this __builtin_clzll/__builtin_clz
generates optimized code for the instruction.

PR target/110181

include/ChangeLog

* longlong.h  [__riscv] (count_leading_zeros): Define.
[__riscv] (count_trailing_zeros): Likewise.
[__riscv] (COUNT_LEADING_ZEROS_0): Likewise.

2 weeks agoRISC-V: Add test for vec_duplicate + vminu.vv combine case 1 with GR2VR cost 0, 1...
Pan Li [Thu, 19 Jun 2025 02:49:07 +0000 (10:49 +0800)] 
RISC-V: Add test for vec_duplicate + vminu.vv combine case 1 with GR2VR cost 0, 1 and 2

Add asm dump check test for vec_duplicate + vminu.vv combine to
vminu.vx, with the GR2VR cost is 0, 1 and 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check
for vminu.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>
2 weeks agoRISC-V: Add test for vec_duplicate + vminu.vv combine case 0 with GR2VR cost 0, 2...
Pan Li [Thu, 19 Jun 2025 02:47:33 +0000 (10:47 +0800)] 
RISC-V: Add test for vec_duplicate + vminu.vv combine case 0 with GR2VR cost 0, 2 and 15

Add asm dump check and run test for vec_duplicate + vminu.vv
combine to vminu.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macors.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add
test data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-1-u8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-2-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-2-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-2-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-2-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
2 weeks agoRISC-V: Combine vec_duplicate + vminu.vv to vminu.vx on GR2VR cost
Pan Li [Thu, 19 Jun 2025 02:44:14 +0000 (10:44 +0800)] 
RISC-V: Combine vec_duplicate + vminu.vv to vminu.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vminu.vv to the
vminu.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, FUNC)                                      \
  void                                                                \
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {                                                                   \
    for (unsigned i = 0; i < n; i++)                                  \
      out[i] = FUNC (in[i], x);                                       \
  }

  uint32_t min(uint32 a, uint32 b)
  {
    return a > b ? b : a;
  }

  DEF_VX_BINARY(uint32_t, min)

Before this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma
  13   │     vmv.v.x v2,a2
  14   │     slli    a3,a3,32
  15   │     srli    a3,a3,32
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  18   │     vle32.v v1,0(a1)
  19   │     slli    a4,a5,2
  20   │     sub a3,a3,a5
  21   │     add a1,a1,a4
  22   │     vminu.vv v1,v1,v2
  23   │     vse32.v v1,0(a0)
  24   │     add a0,a0,a4
  25   │     bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     slli    a3,a3,32
  13   │     srli    a3,a3,32
  14   │ .L3:
  15   │     vsetvli a5,a3,e32,m1,ta,ma
  16   │     vle32.v v1,0(a1)
  17   │     slli    a4,a5,2
  18   │     sub a3,a3,a5
  19   │     add a1,a1,a4
  20   │     vminu.vx v1,v1,a2
  21   │     vse32.v v1,0(a0)
  22   │     add a0,a0,a4
  23   │     bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add
new case UMIN.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op umin.

Signed-off-by: Pan Li <pan2.li@intel.com>
2 weeks agoDaily bump.
GCC Administrator [Fri, 20 Jun 2025 00:20:22 +0000 (00:20 +0000)] 
Daily bump.

2 weeks agolibgomp/target.c: Fix buffer size for 'omp requires' diagnostic
Tobias Burnus [Thu, 19 Jun 2025 19:16:42 +0000 (21:16 +0200)] 
libgomp/target.c: Fix buffer size for 'omp requires' diagnostic

One of the buffers that printed the list of set 'omp requires'
requirements missed the 'self' clause addition, being potentially
to short when all device-affecting clauses were passed. Solved it
by moving the sizeof(<string of all permitted values>" into a new
'#define' just above the associated gomp_requires_to_name function.

libgomp/ChangeLog:

* target.c (GOMP_REQUIRES_NAME_BUF_LEN): Define.
(GOMP_offload_register_ver, gomp_target_init): Use it for the
char buffer size.

2 weeks agolibgomp.texi: Document omp(x)::allocator::*, restructure memory allocator doc
Tobias Burnus [Thu, 19 Jun 2025 19:06:11 +0000 (21:06 +0200)] 
libgomp.texi: Document omp(x)::allocator::*, restructure memory allocator doc

libgomp/ChangeLog:

* libgomp.texi (omp_init_allocator): Refer to 'Memory allocation'
for available memory spaces.
(OMP_ALLOCATOR): Move list of traits and predefined memspaces
and allocators to ...
(Memory allocation): ... here. Document omp(x)::allocator::*;
minor wording tweaks, be more explicit about memkind, pinned and
pool_size.

Co-authored-by: waffl3x <waffl3x@baylibre.com>
2 weeks agoexpand: Align PARM_DECLs again to at least BITS_PER_WORD if possible [PR120689]
Jakub Jelinek [Thu, 19 Jun 2025 12:48:00 +0000 (14:48 +0200)] 
expand: Align PARM_DECLs again to at least BITS_PER_WORD if possible [PR120689]

The following testcase shows a regression caused by the r10-577 change
made for cris.  Before that change, the MEM holding (in this case 3 byte)
struct parameter was BITS_PER_WORD aligned, now it is just BITS_PER_UNIT
aligned and that causes significantly worse generated code.
So, the MAX (DECL_ALIGN (parm), BITS_PER_WORD) extra alignment clearly
doesn't help just STRICT_ALIGNMENT targets, but other targets as well.
Of course, it isn't worth doing stack realignment in the rare case of
MAX_SUPPORTED_STACK_ALIGNMENT < BITS_PER_WORD targets like cris, so the
patch only bumps the alignment if it won't go the
> MAX_SUPPORTED_STACK_ALIGNMENT path because of that optimization.
The patch keeps the gcc 15 behavior for avr, pru, m68k and cris (at
least some options for those) and restores the behavior before r10-577 on
other targets.

The change on the testcase on x86_64 is:
bar:
- movl %edi, %eax
- movzbl %dil, %r8d
- movl %esi, %ecx
- movzbl %sil, %r10d
- movl %edx, %r9d
- movzbl %dl, %r11d
- shrl $16, %edi
- andl $65280, %ecx
- shrl $16, %esi
- shrl $16, %edx
- andl $65280, %r9d
- orq %r10, %rcx
- movzbl %dl, %edx
- movzbl %sil, %esi
- andl $65280, %eax
- movzbl %dil, %edi
- salq $16, %rdx
- orq %r11, %r9
- salq $16, %rsi
- orq %r8, %rax
- salq $16, %rdi
- orq %r9, %rdx
- orq %rcx, %rsi
- orq %rax, %rdi
jmp foo

2025-06-19  Jakub Jelinek  <jakub@redhat.com>

PR target/120689
* function.cc (assign_parm_setup_block): Align parm to at least
word alignment even on !STRICT_ALIGNMENT targets, as long as
BITS_PER_WORD is not larger than MAX_SUPPORTED_STACK_ALIGNMENT.

* gcc.target/i386/pr120689.c: New test.

2 weeks agofortran: Statically initialize length of SAVEd character arrays
Mikael Morin [Wed, 18 Jun 2025 20:11:43 +0000 (22:11 +0200)] 
fortran: Statically initialize length of SAVEd character arrays

PR fortran/120713

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_trans_deferred_array): Statically
initialize deferred length variable for SAVEd character arrays.

gcc/testsuite/ChangeLog:

* gfortran.dg/save_alloc_character_1.f90: New test.

2 weeks agox86: Enable *mov<mode>_(and|or) only for -Oz
H.J. Lu [Sat, 24 May 2025 23:40:29 +0000 (07:40 +0800)] 
x86: Enable *mov<mode>_(and|or) only for -Oz

commit ef26c151c14a87177d46fd3d725e7f82e040e89f
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Thu Dec 23 12:33:07 2021 +0000

    x86: PR target/103773: Fix wrong-code with -Oz from pop to memory.

added "*mov<mode>_and" and extended "*mov<mode>_or" to transform
"mov $0,mem" to the shorter "and $0,mem" and "mov $-1,mem" to the shorter
"or $-1,mem" for -Oz.  But the new pattern:

(define_insn "*mov<mode>_and"
  [(set (match_operand:SWI248 0 "memory_operand" "=m")
    (match_operand:SWI248 1 "const0_operand"))
   (clobber (reg:CC FLAGS_REG))]
  "reload_completed"
  "and{<imodesuffix>}\t{%1, %0|%0, %1}"
  [(set_attr "type" "alu1")
   (set_attr "mode" "<MODE>")
   (set_attr "length_immediate" "1")])

and the extended pattern:

(define_insn "*mov<mode>_or"
  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm")
    (match_operand:SWI248 1 "constm1_operand"))
   (clobber (reg:CC FLAGS_REG))]
  "reload_completed"
  "or{<imodesuffix>}\t{%1, %0|%0, %1}"
  [(set_attr "type" "alu1")
   (set_attr "mode" "<MODE>")
   (set_attr "length_immediate" "1")])

aren't guarded for -Oz.  As a result, "and $0,mem" and "or $-1,mem" are
generated without -Oz.

1. Change *mov<mode>_and" to define_insn_and_split and split it to
"mov $0,mem" if not -Oz.
2. Change "*mov<mode>_or" to define_insn_and_split and split it to
"mov $-1,mem" if not -Oz.
3. Don't transform "mov $-1,reg" to "push $-1; pop reg" for -Oz since it
should be transformed to "or $-1,reg".

gcc/

PR target/120427
* config/i386/i386.md (*mov<mode>_and): Changed to
define_insn_and_split.  Split it to "mov $0,mem" if not -Oz.
(*mov<mode>_or): Changed to define_insn_and_split.  Split it
to "mov $-1,mem" if not -Oz.
(peephole2): Don't transform "mov $-1,reg" to "push $-1; pop reg"
for -Oz since it will be transformed to "or $-1,reg".

gcc/testsuite/

PR target/120427
* gcc.target/i386/cold-attribute-4.c: Compile with -Oz.
* gcc.target/i386/pr120427-1.c: New test.
* gcc.target/i386/pr120427-2.c: Likewise.
* gcc.target/i386/pr120427-3.c: Likewise.
* gcc.target/i386/pr120427-4.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2 weeks agoinstall.texi: Note that Texinfo < v7.1 may throw incorrect warnings.
Georg-Johann Lay [Wed, 18 Jun 2025 16:55:02 +0000 (18:55 +0200)] 
install.texi: Note that Texinfo < v7.1 may throw incorrect warnings.

PR other/115893
gcc/
* doc/install.texi (Prerequisites): Note that Texinfo older
than v7.1 may throw incorrect build warnings, cf.
https://lists.nongnu.org/archive/html/help-texinfo/2023-11/msg00004.html

2 weeks agoRISC-V: Add generic tune as default.
Dongyan Chen [Wed, 18 Jun 2025 11:47:28 +0000 (19:47 +0800)] 
RISC-V: Add generic tune as default.

According to the discussion in
https://gcc.gnu.org/pipermail/gcc-patches/2025-June/686893.html, by creating
a -mtune=generic may be a good idea to slove the question regarding the branch
cost.

Changes for v2:
- Delete the code about -mcpu=generic.

gcc/ChangeLog:

* config/riscv/riscv-cores.def (RISCV_TUNE): Add "generic" tune.
* config/riscv/riscv.cc: Add generic_tune_info.
* config/riscv/riscv.h (RISCV_TUNE_STRING_DEFAULT): Change default tune.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-primitiveSemantics_compare_reg_reg_return_reg_reg.c: New test.

2 weeks agodfp: Further decimal_real_to_integer fixes [PR120631]
Jakub Jelinek [Thu, 19 Jun 2025 06:57:27 +0000 (08:57 +0200)] 
dfp: Further decimal_real_to_integer fixes [PR120631]

Unfortunately, the following further testcase shows that there aren't
problems only with very large precisions and large exponents, but pretty
much anything larger than 64-bits.  After all, before _BitInt support dfp
didn't even have {,unsigned }__int128 <-> _Decimal{32,64,128,64x} support,
and the testcase again shows some of the conversions yielding zeros.
While the pr120631.c test worked even without the earlier patch.

So, this patch assumes 64-bit precision at most is ok and for anything
larger it just uses exponent 0 and multiplies afterwards.

2025-06-19  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/120631
* dfp.cc (decimal_real_to_integer): Use result multiplication not just
when precision > 128 and dn.exponent > 19, but when precision > 64
and dn.exponent > 0.

* gcc.dg/dfp/bitint-10.c: New test.
* gcc.dg/dfp/pr120631.c: New test.