git.ipfire.org Git - thirdparty/gcc.git/log

RISC-V: Combine vec_duplicate + vsaddu.vv to vsaddu.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vsaddu.vv to the
vsaddu.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, FUNC)                                      \
  void                                                                \
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {                                                                   \
    for (unsigned i = 0; i < n; i++)                                  \
      out[i] = FUNC (in[i], x);                                       \
  }

  T sat_add(T a, T b)
  {
    return (a + b) | (-(T)((T)(a + b) < a));
  }

  DEF_VX_BINARY(uint32_t, sat_add)

Before this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma
  13   │     vmv.v.x v2,a2
  14   │     slli    a3,a3,32
  15   │     srli    a3,a3,32
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  18   │     vle32.v v1,0(a1)
  19   │     slli    a4,a5,2
  20   │     sub a3,a3,a5
  21   │     add a1,a1,a4
  22   │     vsaddu.vv v1,v1,v2
  23   │     vse32.v v1,0(a0)
  24   │     add a0,a0,a4
  25   │     bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     slli    a3,a3,32
  13   │     srli    a3,a3,32
  14   │ .L3:
  15   │     vsetvli a5,a3,e32,m1,ta,ma
  16   │     vle32.v v1,0(a1)
  17   │     slli    a4,a5,2
  18   │     sub a3,a3,a5
  19   │     add a1,a1,a4
  20   │     vsaddu.vx v1,v1,a2
  21   │     vse32.v v1,0(a0)
  22   │     add a0,a0,a4
  23   │     bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add
new case US_PLUS.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op us_plus.

Signed-off-by: Pan Li <pan2.li@intel.com>

tailc: Allow musttail tail calls with -fsanitize=address [PR120608]

These testcases show another problem with -fsanitize=address
vs. musttail tail calls.  In particular, there can be
  .ASAN_MARK (POISON, &a, 4);
etc. calls after a tail call and those just prevent the tailc pass
to mark the musttail calls as [tail call].
Normally, the sanopt pass (which comes after tailc) will optimize those
away, the optimization is if there are no .ASAN_CHECK calls or normal
function calls dominated by those .ASAN_MARK (POSION, ...) calls, the
poison is not needed, because in the epilog sequence (the one dealt with
in the patch posted earlier today) all the stack slots are unpoisoned anyway
(or poisoned for use-after-return).
Unlike __builtin_tsan_exit_function, .ASAN_MARK is not a real function
and is always expanded inline, so can be never tail called successfully,
so the patch just ignores those for the cfun->has_musttail && diag_musttail
cases.  If there is a non-musttail call, it will fail worst case during
expansion because there is the epilog asan sequence.

2025-06-12  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/120608
* tree-tailcall.cc (empty_eh_cleanup): Ignore .ASAN_MARK (POISON)
internal calls for the cfun->has_musttail case and diag_musttail.
(find_tail_calls): Likewise.

* c-c++-common/asan/pr120608-1.c: New test.
* c-c++-common/asan/pr120608-2.c: New test.

expand: Allow musttail tail calls with -fsanitize=address [PR120608]

The following testcase is rejected by GCC 15 but accepted (with
s/gnu/clang/) by clang.
The problem is that we want to execute a sequence of instructions to
unpoison all automatic variables in the function and mark the var block
allocated for use-after-return sanitization poisoned after the call,
so we were just disabling tail calls if there are any instructions
returned from asan_emit_stack_protection.
It is fine and necessary for normal tail calls, but for musttail
tail calls we actually document that accessing the automatic vars of
the caller is UB as if they end their lifetime right before the tail
call, so we also want address sanitizer user-after-return to diagnose
that.

The following patch will only disable normal tail calls when that sequence
is present, for musttail it will arrange to emit a copy of that sequence
before the tail call sequence.  That sequence only tweaks the shadow memory
and nothing in the code emitted by call expansion should touch the shadow
memory, so it is ok to emit it already before argument setup.

2025-06-23  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/120608
* cfgexpand.cc: Include rtl-iter.h.
(expand_gimple_tailcall): Add ASAN_EPILOG_SEQ argument, if non-NULL
and expand_gimple_stmt emitted a tail call, emit a copy of that
insn sequence before the call sequence.
(expand_gimple_basic_block): Remove DISABLE_TAIL_CALLS argument, add
ASAN_EPILOG_SEQ argument.  Disable tail call flag only on non-musttail
calls if that flag is set, pass it to expand_gimple_tailcall.
(pass_expand::execute): Pass VAR_RET_SEQ directly as last
expand_gimple_basic_block argument rather than its comparison with
NULL.

* g++.dg/asan/pr120608.C: New test.

vect: Use combined peeling and versioning for mutually aligned DRs

Current GCC uses either peeling or versioning, but not in combination,
to handle unaligned data references (DRs) during vectorization. This
limitation causes some loops with early break to fall back to scalar
code at runtime.

Consider the following loop with DRs in its early break condition:

for (int i = start; i < end; i++) {
  if (a[i] == b[i])
    break;
  count++;
}

In the loop, references to a[] and b[] need to be strictly aligned for
vectorization because speculative reads that may cross page boundaries
are not allowed. Current GCC does versioning for this loop by creating a
runtime check like:

((&a[start] | &b[start]) & mask) == 0

to see if two initial addresses both have lower bits zeros. If above
runtime check fails, the loop will fall back to scalar code. However,
it's often possible that DRs are all unaligned at the beginning but they
become all aligned after a few loop iterations. We call this situation
DRs being "mutually aligned".

This patch enables combined peeling and versioning to avoid loops with
mutually aligned DRs falling back to scalar code. Specifically, the
function vect_peeling_supportable is updated in this patch to return a
three-state enum indicating how peeling can make all unsupportable DRs
aligned. In addition to previous true/false return values, a new state
peeling_maybe_supported is used to indicate that peeling may be able to
make these DRs aligned but we are not sure about it at compile time. In
this case, peeling should be combined with versioning so that a runtime
check will be generated to guard the peeled vectorized loop.

A new type of runtime check is also introduced for combined peeling and
versioning. It's enabled when LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT is true.
The new check tests if all DRs recorded in LOOP_VINFO_MAY_MISALIGN_STMTS
have the same lower address bits. For above loop case, the new test will
generate an XOR between two addresses, like:

((&a[start] ^ &b[start]) & mask) == 0

Therefore, if a and b have the same alignment step (element size) and
the same offset from an alignment boundary, a peeled vectorized loop
will run. This new runtime check also works for >2 DRs, with the LHS
expression being:

((a1 ^ a2) | (a2 ^ a3) | (a3 ^ a4) | ... | (an-1 ^ an)) & mask

where ai is the address of i'th DR.

This patch is bootstrapped and regression tested on x86_64-linux-gnu,
arm-linux-gnueabihf and aarch64-linux-gnu.

gcc/ChangeLog:

* tree-vect-data-refs.cc (vect_peeling_supportable): Return new
enum values to indicate if combined peeling and versioning can
potentially support vectorization.
(vect_enhance_data_refs_alignment): Support combined peeling and
versioning in vectorization analysis.
* tree-vect-loop-manip.cc (vect_create_cond_for_align_checks):
Add a new type of runtime check for mutually aligned DRs.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Set
default value of allow_mutual_alignment in the initializer list.
* tree-vectorizer.h (enum peeling_support): Define type of
peeling support for function vect_peeling_supportable.
(LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT): New access macro.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-early-break_133_pfa6.c: Adjust test.

match: Simplify doubled not, negate and conjugate operators to a non-lvalue

gcc/ChangeLog:

* match.pd (`-(-X)`, `~(~X)`, `conj(conj(X))`): Add a
NON_LVALUE_EXPR wrapper to the simplification of doubled unary
operators NEGATE_EXPR, BIT_NOT_EXPR and CONJ_EXPR.

gcc/testsuite/ChangeLog:

* gfortran.dg/non_lvalue_1.f90: New test.

tree-optimization/120729 - limit compile time in uninit_analysis::prune_phi_opnds

The testcase in this PR shows, on the GCC 14 branch, that in some
degenerate cases we can spend exponential time pruning always
initialized paths through a web of PHIs. The following adds
--param uninit-max-prune-work, defaulted to 100000, to limit that
to effectively O(1).

PR tree-optimization/120729
* gimple-predicate-analysis.h (uninit_analysis::prune_phi_opnds):
Add argument of work budget remaining.
* gimple-predicate-analysis.cc (uninit_analysis::prune_phi_opnds):
Likewise. Maintain and honor it throughout the recursion.
* params.opt (uninit-max-prune-work): New.
* doc/invoke.texi (uninit-max-prune-work): Document.

vregs: Use force_subreg when instantiating subregs [PR120721]

In this PR, we started with:

    (subreg:V2DI (reg:DI virtual-reg) 0)

and vregs instantiated the virtual register to the argument pointer.
But:

    (subreg:V2DI (reg:DI ap) 0)

is not a sensible subreg, since the argument pointer certainly can't
be referenced in V2DImode.  This is (IMO correctly) rejected after
g:2dcc6dbd8a00caf7cfa8cac17b3fd1c33d658016.

The vregs code that instantiates the subreg above is specific to
rvalues and already creates new instructions for nonzero offsets.
It is therefore safe to use force_subreg instead of simplify_gen_subreg.

I did wonder whether we should instead say that a subreg of a
virtual register is invalid if the same subreg would be invalid
for the associated hard registers.  But the point of virtual registers
is that the offsets from the hard registers are not known until after
expand has finished, and if an offset is nonzero, the virtual register
will be instantiated into a pseudo that contains the sum of the hard
register and the offset.  The subreg would then be correct for that
pseudo.  The subreg is only invalid in this case because there is
no offset.

gcc/
PR rtl-optimization/120721
* function.cc (instantiate_virtual_regs_in_insn): Use force_subreg
instead of simplify_gen_subreg when instantiating an rvalue SUBREG.

gcc/testsuite/
PR rtl-optimization/120721
* g++.dg/torture/pr120721.C: New test.

x86: Don't use vmovdqu16/vmovdqu8 with non-EVEX registers

Don't use vmovdqu16/vmovdqu8 with non-EVEX register operands just because
AVX512BW is available.

gcc/

PR target/120728
* config/i386/i386.cc (ix86_get_ssemov): Use vmovdqu16/vmovdqu8
only with EVEX register operands.

gcc/testsuite/

PR target/120728
* gcc.target/i386/avx512bw-vmovdqu16-1.c: Scan vmovdqu for
non-EVEX register operands.
* gcc.target/i386/avx512bw-vmovdqu8-1.c: Likewise.
* gcc.target/i386/avx512fp16-13.c: Likewise.
* gcc.target/i386/pr100865-10b.c: Likewise.
* gcc.target/i386/pr100865-3.c: Likewise.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-5b.c: Likewise.
* gcc.target/i386/pr90773-15.c: Likewise.
* gcc.target/i386/pr90773-16.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
* gcc.target/i386/pr95483-5.c: Likewise.
* gcc.target/i386/pr120728.c: New test.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

x86: Add PROCESSOR_XXX comments to processor_cost_table

Add a PROCESSOR_XXX comment to each entry in processor_cost_table to
describe which processor the cost enry is applied to.

* config/i386/i386-options.cc (processor_cost_table): Add a
PROCESSOR_XXX comment to each entry.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Daily bump.

Ada: Replace hardcoded GNAT commands for GNAT tools

This replaces the hardcoded gnat{make,link,bind,ls} commands with expansion
of the GNAT{MAKE,BIND} variables computed by the configure machinery, during
the build of the GNAT tools.

The default GNATMAKE_FOR_HOST duplicates the default GNATMAKE, and someone
setting GNATMAKE in the toplevel configuration may want it applied for all
host compilations. Direct assignment of GNATMAKE_FOR_HOST keeps working.

gcc/ada/
PR ada/120106
* gcc-interface/Make-lang.in: Set GNAT{MAKE,BIND,LINK_LS}_FOR_HOST
from GNAT{MAKE,BIND} instead of using hardcoded commands.
gnattools/
PR ada/120106
* configure.ac: Remove ACX_NONCANONICAL_HOST and add ACX_PROG_GNAT.
* configure: Regenerate.
* Makefile.in: Do not substitute host_noncanonical but substitute
GNATMAKE and GNATBIND.
Set GNAT{MAKE,BIND,LINK_LS}_FOR_HOST from GNAT{MAKE,BIND} instead
of using hardcoded commands.

Ada: Remove obsolete stuff in Makefile fragment

gcc/ada/
* Make-generated.in: Remove obsolete stuff.

Ada: Introduce GNATMAKE_FOR_BUILD Makefile variable

This gets rid of the hardcoded 'gnatmake' command used during the build.

/
PR ada/120106
* Makefile.tpl: Add GNATMAKE_FOR_BUILD to {HOST,BASE_TARGET}_EXPORTS
* Makefile.in: Regenerate.
* configure.ac: Set the default and substitute the variable.
* configure: Regenerate.
gcc/ada/
PR ada/120106
* Make-generated.in: Use GNATMAKE_FOR_BUILD instead of gnatmake.
* gcc-interface/Makefile.in: Likewise.

[RISC-V][PR target/119830] Fix RISC-V codegen on 32bit hosts

So this is Andrew's patch from the PR. We weren't clean for a 32bit host in
some of the arithmetic for constant synthesis.

I confirmed the bug on a 32bit linux host, then confirmed that Andrew's patch
from the PR fixes the problem, then ran Andrew's patch through my tester
successfully.

Naturally I'll wait for pre-commit testing, but I'm not expecting problems.

PR target/119830
gcc/
* config/riscv/riscv.cc (riscv_build_integer_1): Make arithmetic in bclr case
clean for 32 bit hosts.

gcc/testsuite/
* gcc.target/riscv/pr119830.c: New test.

[committed][PR rtl-optimization/120550] Drop REG_EQUAL note after ext-dce transformation

This bug was found by Edwin's fuzzing efforts on RISC-V, though it likely
affects other targets.

In simplest terms when ext-dce converts an extension into a (possibly
simplified) subreg copy it may make an attached REG_EQUAL note invalid.

In the case Edwin found the note was an extension, but I don't think that would
necessarily always be the case.  The note could have other forms which
potentially need invalidation.  So the safest thing to do is just remove any
attached REG_EQUAL or REG_EQUIV note.

Note adjusting Edwin's testcase in the obvious way to avoid having to interpret
printf output for pass/fail status makes the bug go latent.  That's why no
testcase is included with this patch.

Bootstrapped and regression tested on x86_64.  Obviously also verified it fixes
the testcase Edwin filed.

This is a good candidate for cherry-picking to the gcc-15 release branch after
simmering on the trunk a bit.

PR rtl-optimization/120550
gcc/
* ext-dce.cc (ext_dce_try_optimize_insn): Drop REG_EQUAL/REG_EQUIV
notes on modified insns.

xtensa: Make use of DEPBITS instruction

This patch implements bitfield insertion MD pattern using the DEPBITS
machine instruction, the counterpart of the EXTUI instruction, if
available.

     /* example */
     struct foo {
       unsigned int b:10;
       unsigned int r:11;
       unsigned int g:11;
     };
     void test(struct foo *p) {
       p->g >>= 1;
     }

     ;; result (endianness: little)
     test:
      entry sp, 32
      l32i.n a8, a2, 0
      extui a9, a8, 1, 10
      depbits a8, a9, 0, 11
      s32i.n a8, a2, 0
      retw.n

gcc/ChangeLog:

* config/xtensa/xtensa.h (TARGET_DEPBITS): New macro.
* config/xtensa/xtensa.md (insvsi): New insn pattern.

xtensa: Implement TARGET_ZERO_CALL_USED_REGS

This patch implements the target-specific ZERO_CALL_USED_REGS hook, since
if -fzero-call-used-regs=all the default hook tries to assign 0 to B0
(bit 0 of the BR register) and the ICE will be thrown.

gcc/ChangeLog:

* config/xtensa/xtensa.cc (xtensa_zero_call_used_regs):
New prototype and function.
(TARGET_ZERO_CALL_USED_REGS): Define macro.

Fix some problems with afdo propagation

This patch fixes problems I noticed by exploring profiles of some hot
functions in GCC. In particular the propagation sometimes changed
precise 0 to afdo 0 for paths calling abort and sometimes we could
propagate more when we accept that some paths has 0 count.
Finally there was important bug in computing all_known which
resulted in BB probabilities to be quite broken after afdo.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* auto-profile.cc (update_count_by_afdo_count): Make static;
add variant accepting profile_count.
(afdo_find_equiv_class): Use update_count_by_afdo_count.
(afdo_propagate_edge): Likewise.
(afdo_propagate): Likewise.
(afdo_calculate_branch_prob): Fix handling of all_known.
(afdo_annotate_cfg): Annotate by 0 where both afdo and static
profile agrees.

Handle functions with 0 profile in auto-profile

This is the last part of the infrastructure to allow functions with
local profiles and 0 global autofdo counts.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* auto-profile.cc (afdo_set_bb_count): Dump inline stacks
and reasons when lookup failed.
(afdo_set_bb_count): Record info about BBs with zero AFDO count.
(afdo_annotate_cfg): Set profile to global0_afdo if there are
no samples in profile.

[PR modula2/120731] error in Strings.Pos causing sigsegv

This patch corrects the m2log library procedure function
Strings.Pos which incorrectly sliced the wrong component
of the source string. The incorrect slice could cause
a sigsegv if negative slice indices were generated.

gcc/m2/ChangeLog:

PR modula2/120731
* gm2-libs-log/Strings.def (Delete): Rewrite comment.
* gm2-libs-log/Strings.mod (Pos): Rewrite.
(PosLower): New procedure function.

gcc/testsuite/ChangeLog:

PR modula2/120731
* gm2/pimlib/logitech/run/pass/teststrings.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Prevent possible overflows in ipa-profile

The bug in scaling profile of fnsplit produced clones made
some afdo counts during gcc bootstrap very large (2^59).
This made computations in ipa-profile to silently overflow
which triggered hot count to be identified as 1 instead of
sane value.

While fixing the fnsplit bug prevents overflow, I think the histogram code
should be made safe too.  sreal is not very fitting here since mantisa is 32bit
and not very good for many additions of many numbers which are possibly of very
different order.  So I use widest_int while 128bit arithmetics would be safe
(we are summing 60 bit counts multiplied by time estimates).  I don't think
we have readily available 128bit type and code is not really time critical since
the histogram is computed once.

gcc/ChangeLog:

* ipa-profile.cc (ipa_profile): Use widest_int to avoid
possible overflows.

Scale up auto-profile counts

This patch makes auto-profile counts to scale up when the train run has
too small maximal count. This still happens on moderately sized train runs
such as SPEC2017 ref datasets and helps to avoid rounding errors in cases
where number of samples is small.

gcc/ChangeLog:

* auto-profile.cc (autofdo::afdo_count_scale): New.
(autofdo_source_profile::update_inlined_ind_target): Scale
counts.
(autofdo_source_profile::read): Set scale and dump
statistics.
(afdo_indirect_call): Scale.
(afdo_set_bb_count): Scale.
(afdo_find_equiv_class): Fix dumps.
(afdo_annotate_cfg): Scale.

Add GUESSED_GLOBAL0_AFDO

This patch adds GUESSED_GLOBAL0_AFDO profile quality. It can
be used to preserve local counts of functions which have 0 AFDO
profile.

I originally did not include it as it was not clear it will be useful and
it turns quality from 3bits to 4bits which means that we need to steal another
bit from the actual counters.

It is likely not a problem for profile_count since counting up to 2^60 still
takes a while. However with profile_probability I run into a problem that
gimple FE testcases encodes probability with current representation and thus
changing profile_probability::n_bits would require updating all testcases.

Since probabilities never use GLOBAL0 qualities (they are always local)
addoing bits is not necessary, but it requires encoding quality in
the data type that required adding accessors.

While working on this I also noticed that GUESSED_GLOBAL0 and
GUESSED_GLOBAL0_ADJUSED are misordered. Qualities should be in
increasing order. This is also fixed.

auto-profile will be updated later.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

gcc/ChangeLog:

* cgraph.cc (cgraph_node::make_profile_global0): Support
GUESSED_GLOBAL0_AFDO
* ipa-cp.cc (update_profiling_info): Use
GUESSED_GLOBAL0_AFDO.
* profile-count.cc (profile_probability::dump): Use
quality ().
(profile_probability::stream_in): Use m_adjusted_quality.
(profile_probability::stream_out): Use m_adjusted_quality.
(profile_count::combine_with_ipa_count): Use quality ().
(profile_probability::sqrt): Likewise.
* profile-count.h (enum profile_quality): Add
GUESSED_GLOBAL0_AFDO; reoder GUESSED_GLOBAL0_ADJUSTED and
GUESSED_GLOBAL0.
(profile_probability): Add min_quality; replase m_quality
by m_adjused_quality; add set_quality; update all users
of quality.
(profile_count): Set n_bits to 60; make m_quality 4 bits;
update uses of quality.
(profile_count::afdo_zero, profile_count::globa0afdo): New.

Daily bump.

Fix profile after fnsplit

when splitting functions, tree-inline determined correctly entry count of the
new function part, but then in case entry block of new function part is in a
loop it scales body which is not suposed to happen.

* tree-inline.cc (copy_cfg_body): Fix profile of split functions.

[modula2] Comment tidyup in gm2-compiler/M2GCCDeclare.mod

This patch reformats three comments in the GNU GCC style.

gcc/m2/ChangeLog:

* gm2-compiler/M2GCCDeclare.mod (StartDeclareModuleScopeSeparate):
Reformat statement comments.
(StartDeclareModuleScopeWholeProgram): Ditto.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

[RISC-V][PR target/118241] Fix data prefetch predicate/constraint for RISC-V

The RISC-V prefetch support is broken in a few ways.  This addresses the data
side prefetch problems.  I'd mistakenly thought this BZ was a prefetch.i
related (which has deeper problems).

The basic problem is we were accepting any valid address when in fact there are
restrictions.  This patch more precisely defines the predicate such that we
allow

REG
REG+D

Where D must have the low 5 bits clear.  Note that absolute addresses fall into
the REG+D form using the x0 for the register operand since it always has the
value zero.  The test verifies REG, REG+D, ABS addressing modes that are valid
as well as REG+D and ABS which must be reloaded into a REG because the
displacement has low bits set.

An earlier version of this patch has gone through testing in my tester on rv32
and rv64.  Obviously I'll wait for pre-commit CI to do its thing before moving
forward.

This is a good backport candidate after simmering on the trunk for a bit.

PR target/118241
gcc/
* config/riscv/predicates.md (prefetch_operand): New predicate.
* config/riscv/constraints.md (Q): New constraint.
* config/riscv/riscv.md (prefetch): Use new predicate and constraint.
(riscv_prefetchi_<mode>): Similarly.

gcc/testsuite/
* gcc.target/riscv/pr118241.c: New test.

value-range: Use int instead of uint for wi::ctz result [PR120746]

uint is some compatibility type in glibc sys/types.h enabled in misc/GNU
modes, so it doesn't exist on many hosts.
Furthermore, wi::ctz returns int rather than unsigned and the var is
only used in comparison to zero or as second argument of left shift, so
I think just using int instead of unsigned is better.

2025-06-21 Jakub Jelinek <jakub@redhat.com>

PR middle-end/120746
* value-range.cc (irange::snap): Use int type instead of uint.

Extend afdo inliner to introduce speculative calls

This patch makes the AFDO's VPT to happen during early inlining.  This should
make the einline pass inside afdo pass unnecesary, but some inlining still
happens there - I will need to debug why that happens and will try to drop the
afdo's inliner incrementally.

get_inline_stack_in_node can now be used to produce inline stack out of
callgraph nodes which are marked as inline clones, so we do not need to iterate
tree-inline and IPA decisions phases like old code did.   I also added some
debug facilities - dumping of decisions and inline stacks, so one can match
them with data in gcov profile.

Former VPT pass identified all caes where in train run indirect call was inlined
and the inlined callee collected some samples. In this case it forced inline without
doing any checks, such as whether inlining is possible.

New code simply introduces speculative edges into callgraph and lets afdo inlining
to decide.  Old code also marked statements that were introduced during promotion
to prevent doing double speculation i.e.

   if (ptr == foo)
      .. inlined foo ...
   else
      ptr ();

to

   if (ptr == foo)
      .. inlined foo ...
   else if (ptr == foo)
      foo (); // for IPA inlining
   else
      ptr ();

Since inlning now happens much earlier, tracking the statements would be quite hard.
Instead I simply remove the targets from profile data which sould have same effect.

I also noticed that there is nothing setting max_count so all non-0 profile is
considered hot which I fixed too.

Training with ref run I now get
500.perlbench_r       1    160           9.93  *       1    162           9.84  *
502.gcc_r                                     NR                               NR
505.mcf_r             1    186           8.68  *       1    194           8.34  *
520.omnetpp_r         1    183           7.15  *       1    208           6.32  *
523.xalancbmk_r                               NR                               NR
525.x264_r            1     85.2        20.5   *       1     85.8        20.4   *
531.deepsjeng_r       1    165           6.93  *       1    176           6.51  *
541.leela_r           1    268           6.18  *       1    282           5.87  *
548.exchange2_r       1     86.3        30.4   *       1     88.9        29.5   *
557.xz_r              1    224           4.81  *       1    224           4.82  *
Est. SPECrate2017_int_base              9.72
Est. SPECrate2017_int_peak                                               9.33

503.bwaves_r                                  NR                               NR
507.cactuBSSN_r       1    107          11.9   *       1      105        12.0   *
508.namd_r            1    108           8.79  *       1      116         8.18  *
510.parest_r          1    143          18.3   *       1      156        16.8   *
511.povray_r          1    188          12.4   *       1      163        14.4   *
519.lbm_r             1     72.0        14.6   *       1       75.0      14.1   *
521.wrf_r             1    106          21.1   *       1      106        21.1   *
526.blender_r         1    147          10.3   *       1      147        10.4   *
527.cam4_r            1    110          15.9   *       1      118        14.8   *
538.imagick_r         1    104          23.8   *       1      105        23.7   *
544.nab_r             1    146          11.6   *       1      143        11.8   *
549.fotonik3d_r       1    134          29.0   *       1      169        23.1   *
554.roms_r            1     86.6        18.4   *       1       89.3      17.8   *
Est. SPECrate2017_fp_base               15.4
Est. SPECrate2017_fp_peak                                                14.9

Base is without profile feedback and peak is AFDO.

gcc/ChangeLog:

* auto-profile.cc (dump_inline_stack): New function.
(get_inline_stack_in_node): New function.
(get_relative_location_for_stmt): Add FN parameter.
(has_indirect_call): Remove.
(function_instance::find_icall_target_map): Add FN parameter.
(function_instance::remove_icall_target): New function.
(function_instance::read_function_instance): Set sum_max.
(autofdo_source_profile::get_count_info): Add NODE parameter.
(autofdo_source_profile::update_inlined_ind_target): Add NODE parameter.
(autofdo_source_profile::remove_icall_target): New function.
(afdo_indirect_call): Add INDIRECT_EDGE parameter; dump reason
for failure; do not check for recursion; do not inline call.
(afdo_vpt): Add INDIRECT_EDGE parameter.
(afdo_set_bb_count): Do not take PROMOTED set.
(afdo_vpt_for_early_inline): Remove.
(afdo_annotate_cfg): Do not take PROMOTED set.
(auto_profile): Do not call afdo_vpt_for_early_inline.
(afdo_callsite_hot_enough_for_early_inline): Dump count.
(remove_afdo_speculative_target): New function.
* auto-profile.h (afdo_vpt_for_early_inline): Declare.
(remove_afdo_speculative_target): Declare.
* ipa-inline.cc (inline_functions_by_afdo): Do VPT.
(early_inliner): Redirecct edges if inlining happened.
* tree-inline.cc (expand_call_inline): Add sanity check.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/afdo-vpt-earlyinline.c: Update template.
* gcc.dg/tree-prof/indir-call-prof-2.c: Update template.

Implement afdo inliner

This patch moves afdo inlining from early inliner into specialized one.
The reason is that early inliner is by design non-recursive while afdo
inliner needs to recurse. In the past google handled it by increasing
early inliner iterations, but it can be done easily and cheaply without
it by simply recusing into inlined functions.

I will also look into moving VPT to early inliner now.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* auto-profile.cc (get_inline_stack): Add fn parameter.
* ipa-inline.cc (want_early_inline_function_p): Do not care
about AFDO.
(inline_functions_by_afdo): New function.
(early_inliner): Use it.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/afdo-vpt-earlyinline.c: Update template.
* gcc.dg/tree-prof/indir-call-prof-2.c: Likewise.
* gcc.dg/tree-prof/afdo-inline.c: New test.

RISC-V: Fix ICE for expand_select_vldi [PR120652]

The will be one ICE when expand pass, the bt similar as below.

during RTL pass: expand
red.c: In function 'main':
red.c:20:5: internal compiler error: in require, at machmode.h:323
   20 | int main() {
      |     ^~~~
0x2e0b1d6 internal_error(char const*, ...)
        ../../../gcc/gcc/diagnostic-global-context.cc:517
0xd0d3ed fancy_abort(char const*, int, char const*)
        ../../../gcc/gcc/diagnostic.cc:1803
0xc3da74 opt_mode<machine_mode>::require() const
        ../../../gcc/gcc/machmode.h:323
0xc3de2f opt_mode<machine_mode>::require() const
        ../../../gcc/gcc/poly-int.h:1383
0xc3de2f riscv_vector::expand_select_vl(rtx_def**)
        ../../../gcc/gcc/config/riscv/riscv-v.cc:4218
0x21c7d22 gen_select_vldi(rtx_def*, rtx_def*, rtx_def*)
        ../../../gcc/gcc/config/riscv/autovec.md:1344
0x134db6c maybe_expand_insn(insn_code, unsigned int, expand_operand*)
        ../../../gcc/gcc/optabs.cc:8257
0x134db6c expand_insn(insn_code, unsigned int, expand_operand*)
        ../../../gcc/gcc/optabs.cc:8288
0x11b21d3 expand_fn_using_insn
        ../../../gcc/gcc/internal-fn.cc:318
0xef32cf expand_call_stmt
        ../../../gcc/gcc/cfgexpand.cc:3097
0xef32cf expand_gimple_stmt_1
        ../../../gcc/gcc/cfgexpand.cc:4264
0xef32cf expand_gimple_stmt
        ../../../gcc/gcc/cfgexpand.cc:4411
0xef95b6 expand_gimple_basic_block
        ../../../gcc/gcc/cfgexpand.cc:6472
0xefb66f execute
        ../../../gcc/gcc/cfgexpand.cc:7223

The select_vl op_1 and op_2 may be the same const_int like (const_int 32).
And then maybe_legitimize_operands will:

1. First mov the const op_1 to a reg.
2. Resue the reg of op_1 for op_2 as the op_1 and op_2 is equal.

That will break the assumption that the op_2 of select_vl is immediate,
or something like CONST_INT_POLY.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

PR target/120652

gcc/ChangeLog:

* config/riscv/autovec.md: Add immediate_operand for
select_vl operand 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr120652-1.c: New test.
* gcc.target/riscv/rvv/autovec/pr120652-2.c: New test.
* gcc.target/riscv/rvv/autovec/pr120652-3.c: New test.
* gcc.target/riscv/rvv/autovec/pr120652.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

cobol: Correct diagnostic strings for 32-bit builds.

Avoid %z for printf-family. Cast pid_t to long. Avoid use of YYUNDEF
for old Bison versions.

PR cobol/120621

gcc/cobol/ChangeLog:

* genapi.cc (parser_compile_ecs): Cast argument to unsigned long.
(parser_compile_dcls): Same.
(parser_division): RAII.
(inspect_tally): Cast argument to unsigned long.
* lexio.cc (cdftext::lex_open): Cast pid_t to long.
* parse.y: hard-code values for old versions of Bison, and message format.
* scan_ante.h (wait_for_the_child): Cast pid_t to long.

Fix range wrap check and enhance verify_range.

when snapping range bounds to satidsdaybitmask constraints, end bound overflow
and underflow checks were not working properly.
Also Adjust some comments, and enhance verify_range to make sure range pairs
are sorted properly.

PR tree-optimization/120701
gcc/
* value-range.cc (irange::verify_range): Verify range pairs are
sorted properly.
(irange::snap): Check for over/underflow properly.

gcc/testsuite/
* gcc.dg/pr120701.c: New.

amdgcn: allow SImode in VCC_HI [PR120722]

This patch isn't fully tested yet, but it fixes the build failure, so that
will do for now. SImode was not allowed in VCC_HI because there were issues,
way back before the port went upstream, so it's possible we'll find out what
those issues were again soon.

gcc/ChangeLog:

PR target/120722
* config/gcn/gcn.cc (gcn_hard_regno_mode_ok): Allow SImode in VCC_HI.

libgcobol: Add license.

libgcobol/ChangeLog:

* LICENSE: New file.

Use auto_vec in prime paths selftests [PR120634]

The selftests had a bunch of memory leaks that showed up in make
selftest-valgrind as a result of not using auto_vec or other
explicitly calling release. Replacing vec with auto_vec makes the
problem go away. The auto_vec_vec helper is made constructable from a
vec so that objects returned from functions can be automatically
managed too.

PR gcov-profile/120634

gcc/ChangeLog:

* prime-paths.cc (struct auto_vec_vec): Add constructor from
vec.
(test_split_components): Use auto_vec_vec.
(test_scc_internal_prime_paths): Ditto.
(test_scc_entry_exit_paths): Ditto.
(test_complete_prime_paths): Ditto.
(test_entry_prime_paths): Ditto.
(test_singleton_path): Ditto.

Free buffer on function exit [PR120634]

Using auto_vec ensures that the buffer is always free'd when the
function returns.

PR gcov-profile/120634

gcc/ChangeLog:

* prime-paths.cc (trie::paths): Use auto_vec.

tree-optimization/120654 - ICE with range query from IVOPTs

The following ICEs as we hand down an UNDEFINED range to where it
isn't expected. Put the guard that's there earlier.

PR tree-optimization/120654
* vr-values.cc (range_fits_type_p): Check for undefined_p ()
before accessing type ().

* gcc.dg/torture/pr120654.c: New testcase.

x86: Get the widest vector mode from MOVE_MAX

Since MOVE_MAX defines the maximum number of bytes that an instruction
can move quickly between memory and registers, use it to get the widest
vector mode in vector loop when inlining memcpy and memset.

gcc/

PR target/120708
* config/i386/i386-expand.cc (ix86_expand_set_or_cpymem): Use
MOVE_MAX to get the widest vector mode in vector loop.

gcc/testsuite/

PR target/120708
* gcc.target/i386/memcpy-pr120708-1.c: New test.
* gcc.target/i386/memcpy-pr120708-2.c: Likewise.
* gcc.target/i386/memcpy-pr120708-3.c: Likewise.
* gcc.target/i386/memcpy-pr120708-4.c: Likewise.
* gcc.target/i386/memcpy-pr120708-5.c: Likewise.
* gcc.target/i386/memcpy-pr120708-6.c: Likewise.
* gcc.target/i386/memset-pr120708-1.c: Likewise.
* gcc.target/i386/memset-pr120708-2.c: Likewise.
* gcc.target/i386/memcpy-strategy-1.c: Drop dg-skip-if. Replace
-march=atom with -mno-avx -msse2 -mtune=generic
-mtune-ctrl=^sse_typeless_stores.
* gcc.target/i386/memcpy-strategy-2.c: Likewise.
* gcc.target/i386/memcpy-vector_loop-1.c: Likewise.
* gcc.target/i386/memcpy-vector_loop-2.c: Likewise.
* gcc.target/i386/memset-vector_loop-1.c: Likewise.
* gcc.target/i386/memset-vector_loop-2.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

or1k: Improve If-Conversion by delaying cbranch splits

When working on PR120587 I found that the ce1 pass was not able to
properly optimize branches on OpenRISC. This is because of the early
splitting of "compare" and "branch" instructions during the expand pass.

Convert the cbranch* instructions from define_expand to
define_insn_and_split. This dalays the instruction split until after
the ce1 pass is done giving ce1 the best opportunity to perform the
optimizations on the original form of cbranch<mode>4 instructions.

gcc/ChangeLog:

* config/or1k/or1k.cc (or1k_noce_conversion_profitable_p): New
function.
(or1k_is_cmov_insn): New function.
(TARGET_NOCE_CONVERSION_PROFITABLE_P): Define macro.
* config/or1k/or1k.md (cbranchsi4): Convert to insn_and_split.
(cbranch<mode>4): Convert to insn_and_split.

Signed-off-by: Stafford Horne <shorne@gmail.com>

or1k: Implement *extendbisi* to fix ICE in convert_mode_scalar [PR120587]

After commit 2dcc6dbd8a0 ("emit-rtl: Use simplify_subreg_regno to
validate hardware subregs [PR119966]") the OpenRISC port is broken
again.

Add extend* iinstruction patterns for the SR_F pseudo registers to avoid
having to use the subreg conversions which no longer work.

gcc/ChangeLog:

PR target/120587
* config/or1k/or1k.md (zero_extendbisi2_sr_f): New expand.
(extendbisi2_sr_f): New expand.
* config/or1k/predicates.md (sr_f_reg_operand): New predicate.

Signed-off-by: Stafford Horne <shorne@gmail.com>

[RISC-V] Force several tests to use rocket tuning

My tester has been flagging these regressions since the default cost model was
committed, along with several others

> unix/-march=rv64gc_zba_zbb_zbs_zicond: gcc: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O2   scan-assembler-times \\.L[0-9]+\\:\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+add\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[a-x0-9]+\\s+\\.L[0-9]+\\: 1
> unix/-march=rv64gc_zba_zbb_zbs_zicond: gcc: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times \\.L[0-9]+\\:\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+add\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[a-x0-9]+\\s+\\.L[0-9]+\\: 1
> unix/-march=rv64gc_zba_zbb_zbs_zicond: gcc: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times \\.L[0-9]+\\:\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+addi\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[0-9]00\\s+add\\s+\\s*[a-x0-9]+,\\s*[a-x0-9]+,\\s*[a-x0-9]+\\s+\\.L[0-9]+\\: 1

I really question the value of checking the output that precisely in these
tests -- they're supposed to be checking vsetvl correctness and optimization,
so the ordering and such of scalar ops shouldn't really be important at all.

Regardless, since I don't know these tests at all I resisted the temptation to
rip out the undesirable aspects of the test.

Next up, fix the bogus scan or force the old cost model (rocket).  I choose the
latter as a path of least resistance and least surprise.

Waiting for pre-commit CI to spin.

gcc/testsuite
* gcc.target/riscv/rvv/vsetvl/avl_single-37.c: Force rocket tuning.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-17.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-18.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-19.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-20.c: Likewise.

[PATCH] RISC-V: Use builtin clz/ctz when count_leading_zeros and count_trailing_zeros is used

longlong.h for RISCV should define count_leading_zeros and
count_trailing_zeros and COUNT_LEADING_ZEROS_0 when ZBB is enabled.

The following patch patch fixes the bug reported in,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110181

The divdi3 on riscv32 with zbb extension generates __clz_tab
instead of genearating __builtin_clzll/__builtin_clz which is
not efficient since lookup table is emitted.

Updating longlong.h to use this __builtin_clzll/__builtin_clz
generates optimized code for the instruction.

PR target/110181

include/ChangeLog

* longlong.h [__riscv] (count_leading_zeros): Define.
[__riscv] (count_trailing_zeros): Likewise.
[__riscv] (COUNT_LEADING_ZEROS_0): Likewise.

RISC-V: Add test for vec_duplicate + vminu.vv combine case 1 with GR2VR cost 0, 1 and 2

Add asm dump check test for vec_duplicate + vminu.vv combine to
vminu.vx, with the GR2VR cost is 0, 1 and 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check
for vminu.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for vec_duplicate + vminu.vv combine case 0 with GR2VR cost 0, 2 and 15

Add asm dump check and run test for vec_duplicate + vminu.vv
combine to vminu.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macors.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add
test data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-1-u8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-2-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-2-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-2-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-2-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Combine vec_duplicate + vminu.vv to vminu.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vminu.vv to the
vminu.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, FUNC)                                      \
  void                                                                \
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {                                                                   \
    for (unsigned i = 0; i < n; i++)                                  \
      out[i] = FUNC (in[i], x);                                       \
  }

  uint32_t min(uint32 a, uint32 b)
  {
    return a > b ? b : a;
  }

  DEF_VX_BINARY(uint32_t, min)

Before this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma
  13   │     vmv.v.x v2,a2
  14   │     slli    a3,a3,32
  15   │     srli    a3,a3,32
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  18   │     vle32.v v1,0(a1)
  19   │     slli    a4,a5,2
  20   │     sub a3,a3,a5
  21   │     add a1,a1,a4
  22   │     vminu.vv v1,v1,v2
  23   │     vse32.v v1,0(a0)
  24   │     add a0,a0,a4
  25   │     bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     slli    a3,a3,32
  13   │     srli    a3,a3,32
  14   │ .L3:
  15   │     vsetvli a5,a3,e32,m1,ta,ma
  16   │     vle32.v v1,0(a1)
  17   │     slli    a4,a5,2
  18   │     sub a3,a3,a5
  19   │     add a1,a1,a4
  20   │     vminu.vx v1,v1,a2
  21   │     vse32.v v1,0(a0)
  22   │     add a0,a0,a4
  23   │     bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add
new case UMIN.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op umin.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

libgomp/target.c: Fix buffer size for 'omp requires' diagnostic

One of the buffers that printed the list of set 'omp requires'
requirements missed the 'self' clause addition, being potentially
to short when all device-affecting clauses were passed. Solved it
by moving the sizeof(<string of all permitted values>" into a new
'#define' just above the associated gomp_requires_to_name function.

libgomp/ChangeLog:

* target.c (GOMP_REQUIRES_NAME_BUF_LEN): Define.
(GOMP_offload_register_ver, gomp_target_init): Use it for the
char buffer size.

libgomp.texi: Document omp(x)::allocator::*, restructure memory allocator doc

libgomp/ChangeLog:

* libgomp.texi (omp_init_allocator): Refer to 'Memory allocation'
for available memory spaces.
(OMP_ALLOCATOR): Move list of traits and predefined memspaces
and allocators to ...
(Memory allocation): ... here. Document omp(x)::allocator::*;
minor wording tweaks, be more explicit about memkind, pinned and
pool_size.

Co-authored-by: waffl3x <waffl3x@baylibre.com>

expand: Align PARM_DECLs again to at least BITS_PER_WORD if possible [PR120689]

The following testcase shows a regression caused by the r10-577 change
made for cris. Before that change, the MEM holding (in this case 3 byte)
struct parameter was BITS_PER_WORD aligned, now it is just BITS_PER_UNIT
aligned and that causes significantly worse generated code.
So, the MAX (DECL_ALIGN (parm), BITS_PER_WORD) extra alignment clearly
doesn't help just STRICT_ALIGNMENT targets, but other targets as well.
Of course, it isn't worth doing stack realignment in the rare case of
MAX_SUPPORTED_STACK_ALIGNMENT < BITS_PER_WORD targets like cris, so the
patch only bumps the alignment if it won't go the
> MAX_SUPPORTED_STACK_ALIGNMENT path because of that optimization.
The patch keeps the gcc 15 behavior for avr, pru, m68k and cris (at
least some options for those) and restores the behavior before r10-577 on
other targets.

The change on the testcase on x86_64 is:
bar:
- movl %edi, %eax
- movzbl %dil, %r8d
- movl %esi, %ecx
- movzbl %sil, %r10d
- movl %edx, %r9d
- movzbl %dl, %r11d
- shrl $16, %edi
- andl $65280, %ecx
- shrl $16, %esi
- shrl $16, %edx
- andl $65280, %r9d
- orq %r10, %rcx
- movzbl %dl, %edx
- movzbl %sil, %esi
- andl $65280, %eax
- movzbl %dil, %edi
- salq $16, %rdx
- orq %r11, %r9
- salq $16, %rsi
- orq %r8, %rax
- salq $16, %rdi
- orq %r9, %rdx
- orq %rcx, %rsi
- orq %rax, %rdi
jmp foo

2025-06-19 Jakub Jelinek <jakub@redhat.com>

PR target/120689
* function.cc (assign_parm_setup_block): Align parm to at least
word alignment even on !STRICT_ALIGNMENT targets, as long as
BITS_PER_WORD is not larger than MAX_SUPPORTED_STACK_ALIGNMENT.

* gcc.target/i386/pr120689.c: New test.

fortran: Statically initialize length of SAVEd character arrays

PR fortran/120713

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_trans_deferred_array): Statically
initialize deferred length variable for SAVEd character arrays.

gcc/testsuite/ChangeLog:

* gfortran.dg/save_alloc_character_1.f90: New test.

x86: Enable *mov<mode>_(and|or) only for -Oz

commit ef26c151c14a87177d46fd3d725e7f82e040e89f
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Thu Dec 23 12:33:07 2021 +0000

    x86: PR target/103773: Fix wrong-code with -Oz from pop to memory.

added "*mov<mode>_and" and extended "*mov<mode>_or" to transform
"mov $0,mem" to the shorter "and $0,mem" and "mov $-1,mem" to the shorter
"or $-1,mem" for -Oz.  But the new pattern:

(define_insn "*mov<mode>_and"
  [(set (match_operand:SWI248 0 "memory_operand" "=m")
    (match_operand:SWI248 1 "const0_operand"))
   (clobber (reg:CC FLAGS_REG))]
  "reload_completed"
  "and{<imodesuffix>}\t{%1, %0|%0, %1}"
  [(set_attr "type" "alu1")
   (set_attr "mode" "<MODE>")
   (set_attr "length_immediate" "1")])

and the extended pattern:

(define_insn "*mov<mode>_or"
  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm")
    (match_operand:SWI248 1 "constm1_operand"))
   (clobber (reg:CC FLAGS_REG))]
  "reload_completed"
  "or{<imodesuffix>}\t{%1, %0|%0, %1}"
  [(set_attr "type" "alu1")
   (set_attr "mode" "<MODE>")
   (set_attr "length_immediate" "1")])

aren't guarded for -Oz.  As a result, "and $0,mem" and "or $-1,mem" are
generated without -Oz.

1. Change *mov<mode>_and" to define_insn_and_split and split it to
"mov $0,mem" if not -Oz.
2. Change "*mov<mode>_or" to define_insn_and_split and split it to
"mov $-1,mem" if not -Oz.
3. Don't transform "mov $-1,reg" to "push $-1; pop reg" for -Oz since it
should be transformed to "or $-1,reg".

gcc/

PR target/120427
* config/i386/i386.md (*mov<mode>_and): Changed to
define_insn_and_split.  Split it to "mov $0,mem" if not -Oz.
(*mov<mode>_or): Changed to define_insn_and_split.  Split it
to "mov $-1,mem" if not -Oz.
(peephole2): Don't transform "mov $-1,reg" to "push $-1; pop reg"
for -Oz since it will be transformed to "or $-1,reg".

gcc/testsuite/

PR target/120427
* gcc.target/i386/cold-attribute-4.c: Compile with -Oz.
* gcc.target/i386/pr120427-1.c: New test.
* gcc.target/i386/pr120427-2.c: Likewise.
* gcc.target/i386/pr120427-3.c: Likewise.
* gcc.target/i386/pr120427-4.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

install.texi: Note that Texinfo < v7.1 may throw incorrect warnings.

PR other/115893
gcc/
* doc/install.texi (Prerequisites): Note that Texinfo older
than v7.1 may throw incorrect build warnings, cf.
https://lists.nongnu.org/archive/html/help-texinfo/2023-11/msg00004.html

RISC-V: Add generic tune as default.

According to the discussion in
https://gcc.gnu.org/pipermail/gcc-patches/2025-June/686893.html, by creating
a -mtune=generic may be a good idea to slove the question regarding the branch
cost.

Changes for v2:
- Delete the code about -mcpu=generic.

gcc/ChangeLog:

* config/riscv/riscv-cores.def (RISCV_TUNE): Add "generic" tune.
* config/riscv/riscv.cc: Add generic_tune_info.
* config/riscv/riscv.h (RISCV_TUNE_STRING_DEFAULT): Change default tune.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-primitiveSemantics_compare_reg_reg_return_reg_reg.c: New test.

dfp: Further decimal_real_to_integer fixes [PR120631]

Unfortunately, the following further testcase shows that there aren't
problems only with very large precisions and large exponents, but pretty
much anything larger than 64-bits. After all, before _BitInt support dfp
didn't even have {,unsigned }__int128 <-> _Decimal{32,64,128,64x} support,
and the testcase again shows some of the conversions yielding zeros.
While the pr120631.c test worked even without the earlier patch.

So, this patch assumes 64-bit precision at most is ok and for anything
larger it just uses exponent 0 and multiplies afterwards.

2025-06-19 Jakub Jelinek <jakub@redhat.com>

PR middle-end/120631
* dfp.cc (decimal_real_to_integer): Use result multiplication not just
when precision > 128 and dn.exponent > 19, but when precision > 64
and dn.exponent > 0.

* gcc.dg/dfp/bitint-10.c: New test.
* gcc.dg/dfp/pr120631.c: New test.

RISC-V: Use riscv_2x_xlen_mode_p [NFC]

Use riscv_v_ext_mode_p to check the mode size is 2x XLEN, instead of
using "(GET_MODE_UNIT_SIZE (mode) == (UNITS_PER_WORD * 2))".

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Use
riscv_2x_xlen_mode_p.
(riscv_binary_cost): Ditto.
(riscv_hard_regno_mode_ok): Ditto.

RISC-V: Adding cost model for zilsd

Motivation of this patch is we want to use ld/sd if possible when zilsd
is enabled, however the subreg pass may split that into two lw/sw
instructions because the cost, and it only check cost for 64 bits reg move,
that's why we need adjust cost for 64 bit reg move as well.

However even we adjust the cost model, 64 bit shift still use 32 bit
load because it already got split at expand time, this may need to fix
on the expander side, and this apparently need few more time to
investigate, so I just added a testcase with XFAIL to show the current behavior,
and we can fix that...when we have time.

For long term, we may adding a new field to riscv_tune_param to control
the cost model for that.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_cost_model): Add cost model for
zilsd.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zilsd-code-gen-split-subreg-1.c: New test.
* gcc.target/riscv/zilsd-code-gen-split-subreg-2.c: New test.

x86: Fix shrink wrap separate ICE under -fstack-clash-protection [PR120697]

gcc/ChangeLog:

PR target/120697
* config/i386/i386.cc (ix86_expand_prologue):
Remove 3 assertions and associated code.

gcc/testsuite/ChangeLog:

PR target/120697
* gcc.target/i386/stack-clash-protection.c: New test.

Daily bump.

analyzer: make checker_event::m_kind private

No functional change intended.

gcc/analyzer/ChangeLog:
* checker-event.h (checker_event::get_kind): New accessor.
(checker_event::m_kind): Make private.
* checker-path.cc (checker_path::maybe_log): Use accessor for
checker_event::m_kind.
(checker_path::add_event): Likewise.
(checker_path::debug): Likewise.
(checker_path::cfg_edge_pair_at_p): Likewise.
(checker_path::inject_any_inlined_call_events): Likewise.
* diagnostic-manager.cc
(diagnostic_manager::prune_for_sm_diagnostic): Likewise.
(diagnostic_manager::prune_for_sm_diagnostic): Likewise.
(diagnostic_manager::consolidate_conditions): Likewise.
(diagnostic_manager::consolidate_unwind_events): Likewise.
(diagnostic_manager::finish_pruning): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Add space after foo in testcase

gcc/testsuite/
* gcc.dg/pr119039-1.c: Add space in search criteria.

emit-rtl: Use simplify_subreg_regno to validate hardware subregs [PR119966]

PR119966 showed that combine could generate unfoldable hardware subregs
for pru-unknown-elf.  To fix, strengthen the checks performed by
validate_subreg.

The simplify_subreg_regno performs more validity checks than
the simple info.representable_p.  Most importantly, the
targetm.hard_regno_mode_ok hook is called to ensure the hardware
register is valid in subreg's outer mode.  This fixes the rootcause
for PR119966.

The checks for stack-related registers are bypassed because the i386
backend generates them, in this seemingly valid peephole optimization:

   ;; Attempt to always use XOR for zeroing registers (including FP modes).
   (define_peephole2
     [(set (match_operand 0 "general_reg_operand")
           (match_operand 1 "const0_operand"))]
     "GET_MODE_SIZE (GET_MODE (operands[0])) <= UNITS_PER_WORD
      && (! TARGET_USE_MOV0 || optimize_insn_for_size_p ())
      && peep2_regno_dead_p (0, FLAGS_REG)"
     [(parallel [(set (match_dup 0) (const_int 0))
                 (clobber (reg:CC FLAGS_REG))])]
     "operands[0] = gen_lowpart (word_mode, operands[0]);")

Testing done:
  * No regressions were detected for C and C++ on x86_64-pc-linux-gnu.
  * "contrib/compare-all-tests i386" showed no difference in code
    generation.
  * No regressions for pru-unknown-elf.
  * Reverted r16-809-gf725d6765373f7 to expose the now latent PR119966.
    Then ensured pru-unknown-elf build is ok.  Only two cases regressed
    where rnreg pass transforms a valid hardware subreg into invalid
    one.  But I think that is not related to combine's PR119966:
      gcc.c-torture/execute/20040709-1.c
      gcc.c-torture/execute/20040709-2.c

PR target/119966

gcc/ChangeLog:

* emit-rtl.cc (validate_subreg): Call simplify_subreg_regno
instead of checking info.representable_p..
* rtl.h (simplify_subreg_regno): Add new argument
allow_stack_regs.
* rtlanal.cc (simplify_subreg_regno): Do not reject
stack-related registers if allow_stack_regs is true.

Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
Co-authored-by: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

c++, coroutines: CWG2563 promise lifetime extension [PR115908].

This implements the final piece of the revised CWG2563 wording;
"It exits the scope of promise only if the coroutine completed
without suspending."

Considering the coroutine to be made up of two components; a
'ramp' and a 'body' where the body represents the user's original
code and the ramp is responsible for setup of that and for
returning some object to the original caller.

Coroutine state, and responsibility for its release.

A coroutine has some state that persists across suspensions.

The state has two components:
  * State that is specified by the standard and persists for the entire
    life of the coroutine.
  * Local state that is constructed/destructed as scopes in the original
    function body are entered/exited.  The destruction of local state is
    always the responsibility of the body code.

The persistent state (and the overall storage for the state) must be
managed in two places:
  * The ramp function (which allocates and builds this - and can, in some
    cases, be responsible for destroying it)
  * The re-written function body which can destroy it when that body
    completes its final suspend - or when the handle.destroy () is called.

In all cases the ramp holds responsibility for constructing the standard-
mandated persistent state.

There are four ways in which the ramp might be re-entered after starting
the function body:
  A The body could suspend (one might expect that to be the 'normal' case
    for most coroutines).
  B The body might complete either synchronously or via continuations.
  C An exception might be thrown during the setup of the initial await
    expression, before the initial awaiter resumes.
  D An exception might be processed by promise.unhandled_exception () and
    that, in turn, might re-throw it (or throw something else).  In this
    case, the coroutine is considered suspended at the final suspension
    point.

Once the coroutine has passed initial suspend (i.e. the initial awaiter
await_resume() has been called) the body is considered to have a use of
the state.

Until the ramp return value has been constructed, the ramp is considered
to have a use of the state.

To manage these interacting conditions we allocate a reference counter
for the frame state.  This is initialised to 1 by the ramp as part of its
startup (note that failures/exceptions in the startup code are handled
locally to the ramp).

When the body returns (either normally, or by exception) the ramp releases
its use.

Once the rewritten coroutine body is started, the body is considered to
have a use of the frame.  This use (potentially) needs to be released if
an exception is thrown from the body.  We implement this using an eh-only
cleanup around the initial await.  If we have the case D above, then we
do not release the body use.

In case:

  A, typically the ramp would be re-entered with the body holding a use,
  and therefore the ramp should not destroy the state.

  B, both the body and ramp will have released their uses, and the ramp
  should destroy the state.

  C, we must arrange for the body to release its use, because we require
  the ramp to cleanup in this circumstance.

  D is an outlier, since the responsibility for destruction of the state
  now rests with the user's code (via a handle.destroy() call).

  NOTE: In the case that the body has never suspended before such an
  exception occurs, the only reasonable way for the user code to obtain the
  necessary handle is if unhandled_exception() throws the handle or some
  object that contains the handle.  That is outside of the designs here -
  if the user code might need this corner-case, then such provision will
  have to be made.

In the ramp, we implement destruction for the persistent frame state by
means of cleanups.  These are run conditionally when the reference count
is 0 signalling that both the body and the ramp have completed.

In the body, once we pass the final suspend, then we test the use and
delete the state if the use is 0.

PR c++/115908
PR c++/118074
PR c++/95615

gcc/cp/ChangeLog:

* coroutines.cc (coro_frame_refcount_id): New.
(coro_init_identifiers): Initialise coro_frame_refcount_id.
(build_actor_fn): Set up initial_await_resume_called.  Handle
decrementing of the frame reference count.  Return directly to
the caller if that is non-zero.
(cp_coroutine_transform::wrap_original_function_body): Use a
conditional eh-only cleanup around the initial await expression
to release the body use on exception before initial await
resume.
(cp_coroutine_transform::build_ramp_function): Wrap the called
body in a cleanup that releases a use of the frame when we
return to the ramp.  Implement frame, promise and argument copy
destruction via conditional cleanups when the frame use count
is zero.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr115908.C: Move to...
* g++.dg/coroutines/torture/pr115908.C: ...here.
* g++.dg/coroutines/torture/pr95615-02.C: Move to...
* g++.dg/coroutines/torture/pr95615-01-promise-ctor-throws.C: ...here.
* g++.dg/coroutines/torture/pr95615-03.C: Move to...
* g++.dg/coroutines/torture/pr95615-02-get-return-object-throws.C: ...here.
* g++.dg/coroutines/torture/pr95615-01.C: Move to...
* g++.dg/coroutines/torture/pr95615-03-initial-suspend-throws.C: ...here.
* g++.dg/coroutines/torture/pr95615-04.C: Move to...
* g++.dg/coroutines/torture/pr95615-04-initial-await-ready-throws.C: ...here.
* g++.dg/coroutines/torture/pr95615-05.C: Move to...
* g++.dg/coroutines/torture/pr95615-05-initial-await-suspend-throws.C: ...here.
* g++.dg/coroutines/torture/pr95615.inc: Add more cases and ensure that the
code completes properly when no exceptions are thrown.
* g++.dg/coroutines/torture/pr95615-00-nothing-throws.C: New test.
* g++.dg/coroutines/torture/pr95615-06-initial-await-resume-throws.C: New test.
* g++.dg/coroutines/torture/pr95615-07-body-throws.C: New test.
* g++.dg/coroutines/torture/pr95615-08-initial-suspend-throws-uhe-throws.C: New test.
* g++.dg/coroutines/torture/pr95615-09-body-throws-uhe-throws.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

intersect_bitmask - Always update bitmask.

The bitmask wasn't always being updated, resulting in some less than
perfect masks being stored.:x

* value-range.cc (irange::intersect_bitmask): Always update the
stored mask to reflect the current calculated mask.

Improve contains_p and intersect with bitmasks.

Improve the way contains_p (wide_int) and intersect behave wioth
singletons and bitmasks. Also fix a buglet in bitmask_intersect when the
result is a singleton which is not in the current range.

PR tree-optimization/119039
gcc/
* value-range.cc (irange::contains_p): Call wide_int version of
contains_p for singleton ranges.
(irange::intersect): If either range is a singleton, use
contains_p.

gcc/testsuite/
* gcc.dg/pr119039-2.c: New.

Simplify switches utilizing subranges.

Adjust simplify_switch_using_ranges to use irange rather than relying
on the older legacy_range mechaism.

PR tree-optimization/119039
gcc/
* vr-values.cc (simplify_using_ranges::legacy_fold_cond): Remove.
(simplify_using_ranges::simplify_switch_using_ranges): Adjust.

gcc/testsuite/
* gcc.dg/pr119039-1.c: New.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust thread counts.

Fortran: various fixes for STAT/LSTAT/FSTAT intrinsics [PR82480]

The GNU intrinsics STAT/LSTAT/FSTAT were inherited from g77, but changed
the names of some keywords: FILE became NAME, and SARRAY became VALUES,
which are the keywords documented in the gfortran manual.
Adjust code and libgfortran error messages to reflect this change.
Furthermore, add compile-time checking that INTENT(OUT) arguments are
definable, and that array VALUES has at least size 13.
Document that integer arguments are of default kind, and that overflows
in conversion to integer return -1 in VALUES.

PR fortran/82480

gcc/fortran/ChangeLog:

* check.cc (gfc_check_fstat): Extend checks to INTENT(OUT) arguments.
(gfc_check_fstat_sub): Likewise.
(gfc_check_stat): Likewise.
(gfc_check_stat_sub): Likewise.
* intrinsic.texi: Adjust documentation.

libgfortran/ChangeLog:

* intrinsics/stat.c (stat_i4_sub_0): Fix argument names. Rename
SARRAY to VALUES also in error message. When array VALUES is
KIND=4, get only stat components that do not overflow INT32_MAX,
otherwise set the corresponding VALUES elements to -1.
(stat_i4_sub): Fix argument names.
(lstat_i4_sub): Likewise.
(stat_i8_sub_0): Likewise.
(stat_i8_sub): Likewise.
(lstat_i8_sub): Likewise.
(stat_i4): Likewise.
(stat_i8): Likewise.
(lstat_i4): Likewise.
(lstat_i8): Likewise.
(fstat_i4_sub): Likewise.
(fstat_i8_sub): Likewise.
(fstat_i4): Likewise.
(fstat_i8): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/stat_3.f90: New test.

cobol: Correct diagnostic strings to rectify bootstrap build

Apply patch from Jakub to enable diagnostics. Use %<%> and %qs liberally.

PR cobol/120621

gcc/cobol/ChangeLog:

* cbldiag.h (yyerror): Add diagnostic attributes.
(yywarn): Same.
(error_msg): Same.
(yyerrorvl): Same.
(cbl_unimplementedw): Same.
(cbl_unimplemented): Same.
(cbl_unimplemented_at): Same.
* cdf-copy.cc (copybook_elem_t::open_file): Supply string argument.
* cdf.y: Use %<%>.
* cobol-system.h (if): Check GCC_VERSION.
(ATTRIBUTE_GCOBOL_DIAG): Define.
* except.cc (cbl_enabled_exception_t::dump): Remove extra %s.
* genapi.cc (get_class_condition_string): Use acceptable message.
(get_bytes_needed): Same.
(move_tree): Same.
(get_string_from): Same.
(internal_perform_through): Same.
(tree_type_from_field_type): Same.
(is_valuable): Same.
(parser_logop): Same.
(parser_relop): Same.
(parser_relop_long): Same.
(parser_if): Same.
(parser_setop): Same.
(parser_perform_conditional): Same.
(parser_file_add): Same.
(parser_file_open): Same.
(parser_file_close): Same.
(parser_file_read): Same.
(parser_file_write): Same.
(inspect_replacing): Same.
(parser_sort): Same.
(parser_file_sort): Same.
(parser_file_merge): Same.
(create_and_call): Same.
(parser_bitop): Same.
(parser_bitwise_op): Same.
(hijack_for_development): Same.
(mh_source_is_literalN): Same.
(mh_dest_is_float): Same.
(parser_symbol_add): Same.
* gengen.cc (show_type): Use acceptable message.
(gg_find_field_in_struct): Same.
(gg_declare_variable): Same.
(gg_printf): Same.
(gg_fprintf): Same.
(gg_tack_on_function_parameters): Same.
(gg_define_function): Same.
(gg_get_function_decl): Same.
(gg_finalize_function): Same.
(gg_call_expr): Same.
(gg_call): Same.
(gg_insert_into_assembler): Define new function.
(gg_insert_into_assemblerf): Use gg_insert_into_assembler().
* gengen.h (gg_insert_into_assembler): Simpler function declaration.
(gg_insert_into_assemblerf): Declare new function.
* genmath.cc (parser_op): Use acceptable message.
* genutil.cc (get_binary_value): Use acceptable message.
* lexio.cc (parse_replacing_pair): Correct diagnostic arguments.
(preprocess_filter_add): Same.
(cdftext::open_input): Same.
* parse.y: Use acceptable messages.
* parse_ante.h (struct evaluate_elem_t): Use %<%>.
(is_callable): Same.
* parse_util.h (intrinsic_invalid_parameter): Use %qs.
* scan.l: Use dialect_error().
* scan_ante.h (numstr_of): Use %qs.
(scanner_token): Quote COBOL tokens in messages.
(scanner_parsing): Correct diagnostic message.
(scanner_parsing_toggle): Quote COBOL tokens in messages.
(scanner_parsing_pop): Same.
(typed_name): Use %qs.
* scan_post.h (prelex): Quote COBOL tokens in message.
* show_parse.h (CHECK_FIELD): Use acceptable message format.
(CHECK_LABEL): Same.
* symbols.cc (symbol_field_same_as): Remove extra spaces.
(cbl_alphabet_t::assign): Use %<%>.
(cbl_field_t::internalize): Quote library name in message.
* symbols.h (struct os_locale_t): Constify codeset.
(class temporaries_t): Add copy constructor.
(struct cbl_alphabet_t): Use acceptable message.
* util.cc (symbol_type_str): Use cbl_internal_error.
(cbl_field_type_str): Same.
(is_elementary): Same.
(cbl_field_t::report_invalid_initial_value): Use %qs.
(class unique_stack): Avoid %m.
(ydferror): Declare function with attributes.
(error_msg): Same.
(cobol_fileline_set): Use %<%>.
(os_locale_t): Remove use of xstrdup.
(cobol_parse_files): Quote C names in message.
(dialect_error): Use %<%>.
* util.h (cbl_message): Add attributes.
(cbl_internal_error): Same.
(cbl_err): Same.
(cbl_errx): Same.

Fix dump_function_to_file use of dump_flags

The function gets dump flags as 'flags' parameter, so shouldn't use
dump_flags.

* tree-cfg.cc (dump_function_to_file): Use flags, not dump_flags.

doc: allow gcov.texi to be processed by makeinfo 4.13

As per documentation, even 4.7 ought to suffice. At least 4.13 objects
to there being a blank between @anchor and the opening curly brace.

gcc/

* doc/gcov.texi: Drop blank after @anchor.

doc: allow extend.texi to be processed by makeinfo 4.13

PR middle-end/120544

As per documentation, even 4.7 ought to suffice. At least 4.13 objects
to there being nothing ahead of the first comma in @xref{}.

gcc/

* doc/extend.texi: Fill first argument of @xref{}.

dfp, real: Fix up FLOAT_EXPR/FIX_TRUNC_EXPR constant folding between dfp and large _BitInt [PR120631]

The following testcase shows that while at runtime we handle conversions
between _Decimal{64,128} and large _BitInt correctly, at compile time we
mishandle them in both directions, in one direction we end up in ICE in
decimal_from_integer callee because the char buffer is too short for the
needed number of decimal digits, in the conversion of dfp to large _BitInt
we return 0 in the wide_int.

The following patch fixes the ICE by using larger buffer (XALLOCAVEC
allocated, it will be never larger than 65536 / 3 bytes) in the larger
_BitInt case, and the other direction by setting exponent to exp % 19
and instead multiplying the result by needed powers of 10^19 (10^19 chosen
as largest power of ten that can fit into UHWI).

2025-06-18 Jakub Jelinek <jakub@redhat.com>

PR middle-end/120631
* real.cc (decimal_from_integer): Add digits argument, if larger than
256, use XALLOCAVEC allocated buffer.
(real_from_integer): Pass val_in's precision divided by 3 to
decimal_from_integer.
* dfp.cc (decimal_real_to_integer): For precision > 128 if finite
and exponent is large, decrease exponent and multiply resulting
wide_int by powers of 10^19.

* gcc.dg/dfp/bitint-9.c: New test.

RISC-V: Add test for vec_duplicate + vmin.vv combine case 1 with GR2VR cost 0, 1 and 2

Add asm dump check test for vec_duplicate + vmin.vv combine to
vmin.vx, with the GR2VR cost is 0, 1 and 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check
for vmin.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for vec_duplicate + vmin.vv combine case 0 with GR2VR cost 0, 2 and 15

Add asm dump check and run test for vec_duplicate + vmin.vv
combine to vmin.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-2-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-2-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-2-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-2-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Combine vec_duplicate + vmin.vv to vmin.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vmin.vv to the
vmin.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, FUNC)                                      \
  void                                                                \
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {                                                                   \
    for (unsigned i = 0; i < n; i++)                                  \
      out[i] = FUNC (in[i], x);                                       \
  }

  int32_t min(int32 a, int32 b)
  {
    return a > b ? b : a;
  }

  DEF_VX_BINARY(int32_t, min)

Before this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma
  13   │     vmv.v.x v2,a2
  14   │     slli    a3,a3,32
  15   │     srli    a3,a3,32
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  18   │     vle32.v v1,0(a1)
  19   │     slli    a4,a5,2
  20   │     sub a3,a3,a5
  21   │     add a1,a1,a4
  22   │     vmin.vv v1,v1,v2
  23   │     vse32.v v1,0(a0)
  24   │     add a0,a0,a4
  25   │     bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     slli    a3,a3,32
  13   │     srli    a3,a3,32
  14   │ .L3:
  15   │     vsetvli a5,a3,e32,m1,ta,ma
  16   │     vle32.v v1,0(a1)
  17   │     slli    a4,a5,2
  18   │     sub a3,a3,a5
  19   │     add a1,a1,a4
  20   │     vmin.vx v1,v1,a2
  21   │     vse32.v v1,0(a0)
  22   │     add a0,a0,a4
  23   │     bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add
new case SMIN.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op smin.

Signed-off-by: Pan Li <pan2.li@intel.com>

x86: Enable separate shrink wrapping

This commit implements the target macros (TARGET_SHRINK_WRAP_*) that
enable separate shrink wrapping for function prologues/epilogues in
x86.

When performing separate shrink wrapping, we choose to use mov instead
of push/pop, because using push/pop is more complicated to handle rsp
adjustment and may lose performance, so here we choose to use mov, which
has a small impact on code size, but guarantees performance.

Using mov means we need to use sub/add to maintain the stack frame. In
some special cases, we need to use lea to prevent affecting EFlags.

Avoid inserting sub between test-je-jle to change EFlags, lea should be
used here.

    foo:
        xorl    %eax, %eax
        testl   %edi, %edi
        je      .L11
        sub     $16, %rsp  ------> leaq    -16(%rsp), %rsp
        movq    %r13, 8(%rsp)
        movl    $1, %r13d
        jle     .L4

Tested against SPEC CPU 2017, this change always has a net-positive
effect on the dynamic instruction count.  See the following table for
the breakdown on how this reduces the number of dynamic instructions
per workload on a like-for-like (with/without this commit):

instruction count       base            with commit (commit-base)/commit
502.gcc_r         98666845943 96891561634 -1.80%
526.blender_r         6.21226E+11 6.12992E+11 -1.33%
520.omnetpp_r         1.1241E+11 1.11093E+11 -1.17%
500.perlbench_r 1271558717 1263268350 -0.65%
523.xalancbmk_r         2.20103E+11 2.18836E+11 -0.58%
531.deepsjeng_r         2.73591E+11 2.72114E+11 -0.54%
500.perlbench_r    64195557393 63881512409 -0.49%
541.leela_r         2.99097E+11 2.98245E+11 -0.29%
548.exchange2_r         1.27976E+11 1.27784E+11 -0.15%
527.cam4_r         88981458425 88887334679 -0.11%
554.roms_r         2.60072E+11 2.59809E+11 -0.10%

Collected spec2017 performance on ZNVER5, EMR and ICELAKE. No performance regression was observed.

For O2 multi-copy :
511.povray_r improved by 2.8% on ZNVER5.
511.povray_r improved by 4% on EMR
511.povray_r improved by 3.3 % ~ 4.6% on ICELAKE.

gcc/ChangeLog:

* config/i386/i386-protos.h (ix86_get_separate_components):
New function.
(ix86_components_for_bb): Likewise.
(ix86_disqualify_components): Likewise.
(ix86_emit_prologue_components): Likewise.
(ix86_emit_epilogue_components): Likewise.
(ix86_set_handled_components): Likewise.
* config/i386/i386.cc (save_regs_using_push_pop):
Split from ix86_compute_frame_layout.
(ix86_compute_frame_layout):
Use save_regs_using_push_pop.
(pro_epilogue_adjust_stack):
Use gen_pro_epilogue_adjust_stack_add_nocc.
(ix86_expand_prologue): Add some assertions and adjust
the stack frame at the beginning of the prolog for shrink
wrapping separate.
(ix86_emit_save_regs_using_mov):
Skip registers that are wrapped separately.
(ix86_emit_restore_regs_using_mov): Likewise.
(ix86_expand_epilogue): Add some assertions and set
restore_regs_via_mov to true for shrink wrapping separate.
(ix86_get_separate_components): New function.
(ix86_components_for_bb): Likewise.
(ix86_disqualify_components): Likewise.
(ix86_emit_prologue_components): Likewise.
(ix86_emit_epilogue_components): Likewise.
(ix86_set_handled_components): Likewise.
(TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS): Define.
(TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB): Likewise.
(TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Likewise.
* config/i386/i386.h (struct machine_function):Add
reg_is_wrapped_separately array for register wrapping
information.
* config/i386/i386.md
(@pro_epilogue_adjust_stack_add_nocc<mode>): New.

gcc/testsuite/ChangeLog:

* gcc.target/x86_64/abi/callabi/leaf-2.c: Adjust the test.
* gcc.target/i386/interrupt-16.c: Likewise.
* gfortran.dg/guality/arg1.f90: Likewise.
* gcc.target/i386/avx10_2-comibf-1.c: Likewise.
* g++.target/i386/shrink_wrap_separate.C: New test.
* gcc.target/i386/shrink_wrap_separate_check_lea.c: Likewise.

Co-authored-by: Michael Matz <matz@suse.de>

Snap subrange boundries to bitmask constraints.

Ensure all subrange endpoints conform to the bitmask.

PR tree-optimization/120661
gcc/
* value-range.cc (irange::snap): New.
(irange::snap_subranges): New.
(irange::set_range_from_bitmask): Call snap_subranges.
* value-range.h (snap, snap_subranges): New prototypes.

gcc/testsuite/
* gcc.dg/pr120661-1.c: New.
* gcc.dg/pr120661-2.c: New.

Daily bump.

c++, coroutines: Remove use of coroutine handle in the frame.

We have been keeping a copy of coroutine_handle<promise> in the state
frame, as it was expected to be efficient to use this to initialize the
argument to await_suspend. This does not turn out to be the case and
intializing the value is obstructive to CGW2563 fixes. This removes
the use.

gcc/cp/ChangeLog:

* coroutines.cc (struct coroutine_info): Update comments.
(struct coro_aw_data): Remove self_handle and add in
information to create the handle in lowering.
(expand_one_await_expression): Build a temporary coroutine
handle.
(build_actor_fn): Remove reference to the frame copy of the
coroutine handle.
(cp_coroutine_transform::wrap_original_function_body): Remove
reference to the frame copy of the coroutine handle.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

Ada: Fix assertion failure on problematic container aggregate

This is an assertion failure on code using a container aggregate in the
primitives referenced by the Aggregate aspect, which cannot work.

gcc/ada/
PR ada/120665
* sem_aggr.adb (Resolve_Container_Aggregate): Use robust guards.

gcc/testsuite/
* gnat.dg/specs/aggr8.ads: New test.

PR modula2/120673: Mutually dependent types crash the compiler

This patch fixes an ICE which will occur if cyclic dependent types
are used when declaring a variable. This patch detects the
cyclic dependency and issues an error message for each outstanding
component.

gcc/m2/ChangeLog:

PR modula2/120673
* gm2-compiler/M2GCCDeclare.mod (ErrorDepList): New
global variable set containing every errant dependency symbol.
(mystop): Remove.
(EmitCircularDependancyError): Replace with ...
(EmitCircularDependencyError): ... this.
(AssertAllTypesDeclared): Rewrite.
(DoVariableDeclaration): Ditto.
(TypeDependentsDeclared): New procedure function.
(PrepareGCCVarDeclaration): Ditto.
(DeclareVariable): Remove assert.
(DeclareLocalVariable): Ditto.
(Constructor): Initialize ErrorDepList.
* gm2-compiler/M2MetaError.mod (doErrorScopeProc): Rewrite
and ensure that a symbol with a module scope does not lookup
from a definition module.
* gm2-compiler/P2SymBuild.mod (BuildType): Rewrite so that
a synonym type is created using the token refering to the name
on the lhs.

gcc/testsuite/ChangeLog:

PR modula2/120673
* gm2/pim/fail/badmodvar.mod: New test.
* gm2/pim/fail/cyclictypes.mod: New test.
* gm2/pim/fail/cyclictypes2.mod: New test.
* gm2/pim/fail/cyclictypes4.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Improve static and AFDO profile combination

This patch makes afdo_adjust_guessed_profile more agressive on finding scales
on the boundaries of connected components with no annotation.  Originaly I
looked for edges into or out of the component with known AFDO counts and I also
haled edges from basic block with known AFDO count and known static probability
estimate.

Common problem is with components not containing any in edges, but only out
edges (i.e.  those with ENTRY_BLOCK).  In this case I added logic that looks
for edges out of the component to BBs with known AFDO count.  If all flow to
the BB is either from the component or has AFDO count, we can deterine scale
precisely.  It may happen that there are edges from other components. In this
case we know upper bound and use it, since it is better than nothing.

I also noticed that some components have 0 count in all profile and then scaling
gives up, which is fixed.  I also optimized the code a bit by replacing
map holding current component with an array holding component ID and broke out
saling logic into separate functions.

The patch fixes perl regression I introduced in last change.
according to
https://lnt.opensuse.org/db_default/v4/SPEC/67674
there were improvements (percentage is runtime change):

538.imagick_r -32.52%
549.fotonik3d_r -22.68%
520.omnetpp_r -12.37%
503.bwaves_r -8.71%
508.namd_r -5.10%
526.blender_r -2.11%

and regressions:

554.roms_r 45.95%
527.cam4_r 21.69%
511.povray_r 13.59%
500.perlbench_r 10.19%
507.cactuBSSN_r 9.81%
510.parest_r 9.69%
548.exchange2_r 8.42%
502.gcc_r 5.10%
544.nab_r 3.76%
519.lbm_r 2.34%
541.leela_r 2.16%
525.x264_r 2.14%

This is a bit wild, but hope things will settle donw once we chase out
obvious problems (such as losing the profile of functions that has not been
inlined).

gcc/ChangeLog:

* auto-profile.cc (afdo_indirect_call): Compute speculative edge
probability.
(add_scale): Break out from ...
(scale_bbs): Break out from ...
(afdo_adjust_guessed_profile): ... here; use componet array instead of
current_component hash_map; handle components with only 0 profile;
be more agressive on finding scales along the boundary.

Fix cgraph_node::apply_scale

while working on auto-FDO I noticed that we may run into ICE because we inline
function with count profile_count::zero to a call site with profile_count::zero.
What may go wrong is that the caller has local profile while callee may have
IPA profiles.

We used to turn all such counts to 0, but that has changed by a short circuit
I introducd recently. Fixed thus.

* cgraph.cc (cgraph_node::apply_scale): Special case scaling
to profile_count::zero ().
(cgraph_node::verify_node): Add extra compatibility check.

Add testcase for AFDO early inlining and indirect call promotion

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/afdo-vpt-earlyinline.c: New test.

c++,coroutines: Handle await expressions in assume attributes.

Here we have an expression that is not evaluated but is still seen
as potentially-evaluated. We handle this by determining if the
operand has side-effects, producing a warning that the assume has
been ignored and eliding it.

gcc/cp/ChangeLog:

* coroutines.cc (analyze_expression_awaits): Elide assume
attributes containing await expressions, since these have
side effects. Emit a diagnostic that this has been done.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/assume.C: New test.

[PATCH v1] RISC-V: Use scratch reg for loop control

By using the scratch register for loop control rather than the output
of the lr instruction we can avoid an unnecessary "mv" instruction.

--
V2: Testcase update with no regressions found for the following the changes.

gcc/ChangeLog:

* config/riscv/sync.md (lrsc_atomic_exchange<mode>): Use scratch
register for loop control rather than lr output.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zalrsc.c: New test.

c++: correct __is_trivially_destructible nargs [PR120678]

I missed adjusting the number of args when copying the
IS_TRIVIALLY_CONSTRUCTIBLE line to create IS_TRIVIALLY_DESTRUCTIBLE.

PR c++/120678

gcc/cp/ChangeLog:

* cp-trait.def (IS_TRIVIALLY_DESTRUCTIBLE): Fix nargs.

c++: modules and #pragma diagnostic

To respect the #pragma diagnostic lines in libstdc++ headers when compiling
with module std, we need to represent them in the module.

I think it's reasonable to give serializers direct access to the underlying
data, as here with get_classification_history. This is a different approach
from how Jakub made PCH streaming members of diagnostic_option_classifier,
but it seems to me that modules handling belongs in module.cc.

libcpp/ChangeLog:

* line-map.cc (linemap_location_from_module_p): Add.
* include/line-map.h: Declare it.

gcc/ChangeLog:

* diagnostic.h (diagnostic_option_classifier): Friend
diagnostic_context.
(diagnostic_context::get_classification_history): New.

gcc/cp/ChangeLog:

* module.cc (module_state::write_diagnostic_classification): New.
(module_state::write_begin): Call it.
(module_state::read_diagnostic_classification): New.
(module_state::read_initial): Call it.
(dk_string, dump_dc_change): New.

gcc/testsuite/ChangeLog:

* g++.dg/modules/warn-spec-3_a.C: New test.
* g++.dg/modules/warn-spec-3_b.C: New test.
* g++.dg/modules/warn-spec-3_c.C: New test.

crc: Fix up ICE from optimize_crc_loop [PR120677]

The following testcase ICEs, because optimize_crc_loop inserts a call
statement before labels instead of after labels.

Fixed thusly (plus fixed other issues noticed around it).

2025-06-17  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/120677
* gimple-crc-optimization.cc (crc_optimization::optimize_crc_loop):
Insert before gsi_after_labels instead of gsi_start_bb.  Use
gimple_bb (output_crc) instead of output_crc->bb.  Formatting fix.

* gcc.c-torture/execute/pr120677.c: New test.

aarch64: Add vec_set/extract for tuple modes [PR113027]

We generated inefficient code for bitfield references to Advanced
SIMD structure modes.  In RTL, these modes are just extra-long
vectors, and so inserting and extracting an element is simply
a vec_set or vec_extract operation.

For the record, I don't think these modes should ever become fully
fledged vector modes.  We shouldn't provide add, etc. for them.
But vec_set and vec_extract are the vector equivalent of insv
and extv.  From that point of view, they seem closer to moves
than to arithmetic.

gcc/
PR target/113027
* config/aarch64/aarch64-protos.h (aarch64_decompose_vec_struct_index):
Declare.
* config/aarch64/aarch64.cc (aarch64_decompose_vec_struct_index): New
function.
* config/aarch64/iterators.md (VEL, Vel): Add Advanced SIMD
structure modes.
* config/aarch64/aarch64-simd.md (vec_set<VSTRUCT_QD:mode>)
(vec_extract<VSTRUCT_QD:mode>): New patterns.

gcc/testsuite/
PR target/113027
* gcc.target/aarch64/pr113027-1.c: New test.
* gcc.target/aarch64/pr113027-2.c: Likewise.
* gcc.target/aarch64/pr113027-3.c: Likewise.
* gcc.target/aarch64/pr113027-4.c: Likewise.
* gcc.target/aarch64/pr113027-5.c: Likewise.
* gcc.target/aarch64/pr113027-6.c: Likewise.
* gcc.target/aarch64/pr113027-7.c: Likewise.

OpenMP: Fix implicit 'declare target' for <ostream>

libstdc++-v3/include/std/ostream contains:

  namespace std _GLIBCXX_VISIBILITY(default)
  {
    ...
    template<typename _CharT, typename _Traits>
      inline basic_ostream<_CharT, _Traits>&
      endl(basic_ostream<_CharT, _Traits>& __os)
      { return flush(__os.put(__os.widen('\n'))); }
  ...
  #include <bits/ostream.tcc>

and the latter, libstdc++-v3/include/bits/ostream.tcc, has:
    // Inhibit implicit instantiations for required instantiations,
    // which are defined via explicit instantiations elsewhere.
  #if _GLIBCXX_EXTERN_TEMPLATE
    extern template class basic_ostream<char>;
    extern template ostream& endl(ostream&);

Before this commit, omp_discover_declare_target_tgt_fn_r marked 'endl'
as (implicitly) declare target - but not the calls in it due to the
'extern' (DECL_EXTERNAL).

Thanks to inlining and as 'endl' is (therefore) not used and, hence,
discarded by the linker; hencet, it works with -O0 and -O1. However,
as the (unused) function still exits, IPA CP (enabled by -O2) will try
to do constant-value propagation and fails as the definition of 'widen'
is not available.

Solution is to still walk 'endl' despite being an 'extern(al)' decl;
this has been restricted for now to DECL_DECLARED_INLINE_P.

gcc/ChangeLog:

* omp-offload.cc (omp_discover_declare_target_tgt_fn_r): Also
walk external functions that are declare inline (and have a
DECL_SAVED_TREE).

libgomp/ChangeLog:

* testsuite/libgomp.c++/declare_target-2.C: New test.

c++, coroutines: Handle unevaluated contexts.

From [expr.await]/2
We should not accept co_await, co_yield in unevaluated contexts.

Currently (see PR68604) we do not mark typeid expressions as unevaluated
since the standard rules mean that this depends on the value type.

gcc/cp/ChangeLog:

* coroutines.cc (finish_co_await_expr): Do not allow in an
unevaluated context.
(finish_co_yield_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/unevaluated.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

c++, coroutines: Avoid UNKNOWN_LOCATION synthesizing code [PR120273].

Some of the lookup code is expecting to find a valid (not UNKNOWN)
location, which triggers in the reported case. To avoid this, we are
reverting the change to use UNKNOWN_LOCATION for synthesizing the
wrapper, and instead using the start and end locations of the original
function.

PR c++/120273

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::wrap_original_function_body): Use
function start and end locations when synthesizing code.
(cp_coroutine_transform::cp_coroutine_transform): Set the
function end location.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr120273.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

RISC-V: Add -fno-pie flags to testcases

PIE may cause some code gen difference in the testcases, that will cause
problem when we configure toolchain with `--enable-default-pie`.

So adding -fno-pie flags to the testcases to avoid this issue.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/jump-table-large-code-model.c: Adding
-fno-pie.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv-nofm.c: Ditto.

Daily bump.

cobol: Some 1000 small changes in answer to cppcheck diagnostics.

constification per cppcheck. Use STRICT_WARN and fix reported
diagnostics. Ignore [shadowVariable] in general. Use std::vector to
avoid exposing arrays as raw pointers.

PR cobol/120621

gcc/cobol/ChangeLog:

* Make-lang.in: Use STRICT_WARN.
* cbldiag.h (location_dump): suppress shadowVariable.
* cdf-copy.cc (esc): Fix shadowVariable.
(copybook_elem_t::open_file): Do not use %m.
* cdf.y: suppress invalidPrintfArgType for target format.
* cdfval.h (struct cdfval_t): Suppress noExplicitConstructor.
* cobol1.cc (cobol_name_mangler): Use C++ cast.
* copybook.h (class copybook_elem_t): Same.
* dts.h: Fixes and suppressions due to cppcheck.
* except.cc (cbl_enabled_exceptions_t::status): Suppress useStlAlgorithm.
(cbl_enabled_exceptions_t::turn_on_off): Const parameter.
(class choose_declarative): Removed.
* genapi.cc (struct called_tree_t): Explicit constructor.
(parser_compile_ecs): Cast to void * for %p.
(parser_compile_dcls): Same.
(parser_statement_begin): Same.
(initialize_variable_internal): Use std::vector for subscripts.
(parser_initialize): Constification.
(get_string_from): Same.
(combined_name): Same.
(parser_perform): Same.
(psa_FldLiteralN): Same.
(is_figconst): Const parameter.
(is_figconst_t): Same.
(parser_exit): Same.
(parser_division): Const pointer.
(parser_perform_conditional): Whitespace.
(parser_set_conditional88): Const parameter.
(inspect_tally): Use std::vector.
(inspect_replacing): Same.
(parser_inspect): Same.
(parser_intrinsic_subst): Use std::vector (constuct elements).
(parser_intrinsic_call_1): Use std::vector for subscripts.
(is_ascending_key): Const pointer.
(parser_sort): Use std::vector.
(parser_file_sort): Same.
(parser_file_merge): Same.
(parser_unstring): Same.
(parser_string): Same.
(parser_call): Const pointer.
(parser_program_hierarchy): Use std::vector.
(conditional_abs): Const paraemeter.
(float_type_of): Same.
(initial_from_initial): Set value, quoted or not.
(parser_symbol_add): Remove redundant nested test.
* genapi.h (parser_add): Const parameters.
(parser_subtract): Same.
(parser_multiply): Same.
(parser_divide): Same.
(parser_perform): Same.
(parser_exit): Same.
(parser_initialize): Same.
(parser_set_conditional88): Same.
(parser_sort): Same.
(parser_file_sort): Same.
(parser_file_merge): Same.
(parser_string): Same.
(is_ascending_key): Same.
* genmath.cc (arithmetic_operation): Use std::vector.
(is_somebody_float): Const parameter.
(all_results_binary): Const parameter.
(fast_multiply): Remove redundant nested test.
(parser_add): Const parameter.
(parser_multiply): Remove redundant nested test.
(parser_divide): Const parameter.
(parser_subtract): Same.
* genutil.cc (get_depending_on_value): Use std::vector.
(get_data_offset): Same.
(tree_type_from_field): Const parameter.
(refer_has_depends): Const pointers.
(get_literal_string): RAII.
(refer_is_clean): Use std::vector.
(get_time_nanoseconds): Newline at EOF.
* genutil.h (tree_type_from_field): Remove declaration.
* inspect.h (struct cbx_inspect_qual_t): Use std::vector.
(struct cbl_inspect_qual_t): Same.
(struct cbx_inspect_match_t): Same.
(class cbl_inspect_match_t): Same.
(struct cbx_inspect_replace_t): Same.
(struct cbl_inspect_replace_t): Same.
(struct cbx_inspect_oper_t): Same.
(struct cbl_inspect_oper_t): Same.
(struct cbx_inspect_t): Same.
(struct cbl_inspect_t): Same.
(parser_inspect): Same.
* lexio.cc (indicated): Const pointer.
(remove_inline_comment): Scope reduction.
(maybe_add_space): Const pointer.
(recognize_replacements): C++ cast.
(check_source_format_directive): Same.
(struct replacing_term_t): Explicit constructor.
(parse_replace_pairs): Const reference.
(location_in): Const reference.
(parse_copy_directive): C++ cast.
(parse_replace_last_off): Const parameter.
(parse_replace_text): Const reference.
(parse_replace_directive): C++ cast.
(cdftext::lex_open): Const reference.
(cdftext::open_output): Scope reduction.
(cdftext::free_form_reference_format): Remove unused variable.
(cdftext::process_file): Simplify.
* lexio.h (struct bytespan_t): Use nullptr.
(struct filespan_t): Initialize icol in constructor.
(struct span_t): Suppress confused operatorEqRetRefThis.
(struct replace_t): Eliminate single-value constructor.
* parse.y: Many const cppcheck reports, and portable bit-shift.
* parse_ante.h (reject_refmod): Const parameter.
(require_pointer): Same.
(require_integer): Same.
(struct evaluate_elem_t): Explicit constructor.
(struct arith_t): Use std::vector.
(class eval_subject_t): Const parameter.
(dump_inspect_match): Declare.
(struct perform_t): Explicit constructor.
(list_add): Const parameter.
(class tokenset_t): Avoid negative array index.
(struct file_list_t): Explicit constructor.
(struct field_list_t): Same.
(struct refer_list_t): Same.
(struct refer_marked_list_t): Const parameter.
(struct refer_collection_t): Explicit constructor.
(struct ast_inspect_oper_t): Remove class.
(ast_inspect_oper_t): Same.
(struct ast_inspect_t): Same.
(struct ast_inspect_list_t): Same.
(ast_inspect): Add location.
(struct elem_list_t): Explicit constructor.
(struct unstring_tgt_t): Same.
(struct unstring_tgt_list_t): Same.
(struct unstring_into_t): Same.
(struct ffi_args_t): Same.
(struct file_sort_io_t): Same.
(merge_t): Same.
(struct vargs_t): Same.
(class prog_descr_t): Eliminate single-value constructor.
(class program_stack_t): Suppress useStlAlgorithm.
(struct rel_part_t): Eliminate single-value constructor.
(class log_expr_t): Explicit constructor.
(add_debugging_declarative): Rename local variable.
(intrinsic_call_2): Const parameter.
(invalid_key): Use std::find_if.
(parser_add2): Const parameter.
(parser_subtract2): Same.
(stringify): Same.
(unstringify): Same.
(anybody_redefines): Same.
(ast_call): Same.
* parse_util.h (class cname_cmp): Explicit constructor.
(intrinsic_inconsistent_parameter): Same.
* scan_ante.h (struct cdf_status_t): Eliminate single-value constructor.
(class enter_leave_t): Explicit constructor.
(update_location): Const pointer, explicit constructor.
(symbol_function_token): Const pointer.
(typed_name): Same.
* scan_post.h (datetime_format_of): Scope reduction.
* show_parse.h (class ANALYZE): Use std::vector, explicit consstructor.
* symbols.cc (symbol_table_extend): Scope reduction.
(cbl_ffi_arg_t::cbl_ffi_arg_t): Define default constructor.
(end_of_group): Const pointer.
(symbol_find_odo): Const parameter.
(rename_not_ok): Same.
(field_str): Use %u instead of %d.
(struct capacity_of): Const pointer.
(symbols_update): Same.
(symbol_field_parent_set): Same.
(symbol_file_add): Same.
(symbol_typedef_add): Same.
(symbol_field_add): Use new operator=().
(symbol_field): Suppress CastIntegerToAddressAtReturn.
(symbol_register): Same.
(symbol_file): Suppress knownConditionTrueFalse.
(next_program): Const parameter.
(symbol_file_record): Same.
(class is_section): Explicit constructor.
(cbl_file_t::no_key): Remove.
(cbl_prog_hier_t::cbl_prog_hier_t): Use std::vector.
(symbol_label_add): Assert pointer is not NULL.
(symbol_label_section_exists): Const reference in lambda.
(expand_picture): Use C++ cast.
(symbol_program_callables): Const pointer.
(symbol_currency_add): Suppress nullPointerRedundantCheck.
(cbl_key_t): Use std::vector.
(cbl_occurs_t::field_add): Const parameter.
(cbl_occurs_t::index_add): Explicit constructor.
(class is_field_at): Same.
(cbl_file_key_t::deforward): Scope reduction.
(cbl_file_t::keys_str): Use allocated memory only.
(file_status_status_of): Const pointer.
(is_register_field): Const parameter.
* symbols.h (struct cbl_field_data_t): Eliminate single-value constructor.
(struct cbl_occurs_bounds_t): Same.
(struct cbl_refer_t): Use std::vector.
(valid_move): Const parameter.
(is_register_field): Same.
(struct cbl_key_t): Use std::vector.
(struct cbl_substitute_t): Eliminate single-value constructor.
(refer_of): Return const reference
(struct cbl_ffi_arg_t): Eliminate single-value constructor.
(class temporaries_t): Same.
(struct cbl_file_key_t): Define default constructor.
(struct cbl_file_lock_t): Define copy constructor and operator=().
(struct cbl_file_t): Complete default constructor.
(struct symbol_elem_t): Explicit constructor.
(symbol_elem_of): Suppress cstyleCast.
(symbol_redefines): Const parameter.
(struct cbl_field_t): Same.
(cbl_section_of): Test for NULL pointer.
(cbl_field_of): Same.
(cbl_label_of): Same.
(cbl_special_name_of): Same.
(cbl_alphabet_of): Same.
(cbl_file_of): Same.
(is_figconst): Delete extra "struct" keyword.
(is_figconst_low): Same.
(is_figconst_zero): Same.
(is_figconst_space): Same.
(is_figconst_quote): Same.
(is_figconst_high): Same.
(is_space_value): Same.
(is_quoted): Same.
(symbol_index): Const parameter.
(struct cbl_prog_hier_t): Suppress noExplicitConstructor.
(struct cbl_perform_vary_t): Eliminate single-value constructor.
(is_signable): Const parameter.
(is_temporary): Same.
(rename_not_ok): Same.
(field_at): Test for NULL pointer.
(class procref_base_t): Eliminate single-value constructor.
* symfind.cc (is_data_field): Const pointer.
(finalize_symbol_map2): Same.
(class in_scope): Same.
(symbol_match2): Same.
* token_names.h: Suppress useInitializationList.
* util.cc (normalize_picture): Whitespace and remove extra "continue".
(redefine_field): Const pointer.
(cbl_field_t::report_invalid_initial_value): Same.
(literal_subscript_oob): Rename shadow variable.
(cbl_refer_t::subscripts_set): Use std::vector.
(cbl_refer_t::str): Same.
(cbl_refer_t::deref_str): Same.
(locally_unique): Use explicit constructor.
(ambiguous_reference): Same.
(class unique_stack): Use const reference.
(cobol_filename): Const pointer.
(verify_format): Scope reduction.
(class temp_loc_t): Do not derive from YYLTYPE.
(cobol_parse_files): Const pointer.
* util.h (as_voidp): Define convenient converter.

libgcobol/ChangeLog:

* common-defs.h (class cbl_enabled_exceptions_t): Const parameter.

aarch64: Add support for unpacked SVE FP conversions

This patch introduces expanders for FP<-FP conversions that levarage
partial vector modes.  We also extend the INT<-FP and FP<-INT conversions
using the same approach.

The ACLE enables vectorized conversions like the following:

fcvt z0.h, p7/m, z1.s

modelling the source vector as VNx4SF:

... |     SF|     SF|     SF|     SF|

and the destination as a VNx8HF, where this operation would yield:

... | 0 | HF| 0 | HF| 0 | HF| 0 | HF|

hence the useful results are stored unpacked, i.e.

... | X | HF| X | HF| X | HF| X | HF| (VNx4HF)

This patch allows the vectorizer to use this variant of fcvt as a
conversion from VNx4SF to VNx4HF.  The same idea applies to widening
conversions, and between vectors with FP and integer base types.

If the source itself had been unpacked, e.g.

... |   X   |     SF|   X   |     SF| (VNx2SF)

The result would yield

... | X | X | X | HF| X | X | X | HF| (VNx2HF)

The upper bits of each container here are undefined, it's important to
avoid interpreting them during FP operations - doing so could introduce
spurious traps.  The obvious route we've taken here is to mask undefined
lanes using the operation's predicate if we have flag_trapping_math.

The VPRED predicate mode (e.g. VNx2BI here) cannot do this; to ensure
correct behavior, we need a predicate mode that can control the data as if
it were fully-packed (VNx4BI).

Both VNx2BI and VNx4BI must be recognised as legal governing predicate modes
by the corresponding FP insns.  In general, the governing predicate mode for
an insn could be any such with at least as many significant lanes as the data
mode.  For example, addvnx4hf3 could be controlled by any of VNx{4,8,16}BI.

We implement 'aarch64_predicate_operand', a new define_special_predicate, to
acheive this.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_sve_valid_pred_p):
Declare helper for aarch64_predicate_operand.
(aarch64_sve_packed_pred): Declare helper for new expanders.
(aarch64_sve_fp_pred): Likewise.
* config/aarch64/aarch64-sve.md (<optab><mode><v_int_equiv>2):
Extend into...
(<optab><SVE_HSF:mode><SVE_HSDI:mode>2): New expander for converting
vectors of HF,SF to vectors of HI,SI,DI.
(<optab><VNx2DF_ONLY:mode><SVE_2SDI:mode>2): New expander for converting
vectors of SI,DI to vectors of DF.
(*aarch64_sve_<optab>_nontrunc<SVE_PARTIAL_F:mode><SVE_HSDI:mode>):
New pattern to match those we've added here.
(@aarch64_sve_<optab>_trunc<VNx2DF_ONLY:mode><VNx4SI_ONLY:mode>): Extend
into...
(@aarch64_sve_<optab>_trunc<VNx2DF_ONLY:mode><SVE_SI:mode>): Match both
VNx2SI<-VNx2DF and VNx4SI<-VNx4DF.
(<optab><v_int_equiv><mode>2): Extend into...
(<optab><SVE_HSDI:mode><SVE_F:mode>2): New expander for converting vectors
of HI,SI,DI to vectors of HF,SF,DF.
(*aarch64_sve_<optab>_nonextend<SVE_HSDI:mode><SVE_PARTIAL_F:mode>): New
pattern to match those we've added here.
(trunc<SVE_SDF:mode><SVE_PARTIAL_HSF:mode>2): New expander to handle
narrowing ('truncating') FP<-FP conversions.
(*aarch64_sve_<optab>_trunc<SVE_SDF:mode><SVE_PARTIAL_HSF:mode>): New
pattern to handle those we've added here.
(extend<SVE_PARTIAL_HSF:mode><SVE_SDF:mode>2): New expander to handle
widening ('extending') FP<-FP conversions.
(*aarch64_sve_<optab>_nontrunc<SVE_PARTIAL_HSF:mode><SVE_SDF:mode>): New
pattern to handle those we've added here.
* config/aarch64/aarch64.cc (aarch64_sve_packed_pred): New function.
(aarch64_sve_fp_pred): Likewise.
(aarch64_sve_valid_pred_p): Likewise.
* config/aarch64/iterators.md (SVE_PARTIAL_HSF): New mode iterator.
(SVE_HSF): Likewise.
(SVE_SDF): Likewise.
(SVE_SI): Likewise.
(SVE_2SDI) Likewise.
(self_mask):  Extend to all integer/FP vector modes.
(narrower_mask): Likewise (excluding QI).
* config/aarch64/predicates.md (aarch64_predicate_operand): New special
predicate to handle narrower predicate modes.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/pack_fcvt_signed_1.c: Disable the aarch64 vector
cost model to preserve this test.
* gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c: Likewise.
* gcc.target/aarch64/sve/pack_float_1.c: Likewise.
* gcc.target/aarch64/sve/unpack_float_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cvtf_1.c: New test.
* gcc.target/aarch64/sve/unpacked_cvtf_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cvtf_3.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fcvt_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fcvt_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fcvtz_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fcvtz_2.c: Likewise.

aarch64: Extend iterator support for partial SVE FP modes

Define new iterators for partial floating-point modes, and cover these
in some existing mode_attrs. This patch serves as a starting point for
an effort to extend support for unpacked floating-point operations.

To differentiate between BFloat mode iterators that need to test
TARGET_SSVE_B16B16, and those that don't (see LOGICALF), this patch
enforces the following naming convention:
- _BF: BF16 modes will not test TARGET_SSVE_B16B16.
- _B16B16: BF16 modes will test TARGET_SSVE_B16B16.

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md: Replace uses of SVE_FULL_F_BF
with SVE_FULL_F_B16B16.
Replace use of SVE_F with SVE_F_BF.
* config/aarch64/iterators.md (SVE_PARTIAL_F): New iterator for
partial SVE FP modes.
(SVE_FULL_F_BF): Rename to SVE_FULL_F_B16B16.
(SVE_PARTIAL_F_B16B16): New iterator (BF16 included) for partial
SVE FP modes.
(SVE_F_B16B16): New iterator for all SVE FP modes.
(SVE_BF): New iterator for all SVE BF16 modes.
(SVE_F): Redefine to exclude BF16 modes.
(SVE_F_BF): New iterator to replace the previous SVE_F.
(VPRED): Describe the VPRED mapping for partial vector modes.
(b): Cover partial FP modes.
(is_bf16): Likewise.

Fortran: fix checking of MOLD= in ALLOCATE statements [PR51961]

In ALLOCATE statements where the MOLD= argument is present and is not
scalar, and the allocate-object has an explicit-shape-spec, the standard
does not require the ranks to agree. In that case we skip the rank check,
but emit a warning if -Wsurprising is given.

PR fortran/51961

gcc/fortran/ChangeLog:

* resolve.cc (conformable_arrays): Use modified rank check when
MOLD= expression is given.

gcc/testsuite/ChangeLog:

* gfortran.dg/allocate_with_mold_5.f90: New test.