git.ipfire.org Git - thirdparty/gcc.git/log

c: Don't warn about converting NULL to different sso endian [PR104822]

In a similar way we don't warn about NULL pointer constant conversion to
a different named address we should not warn to a different sso endian
either.
This adds the simple check.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR c/104822

gcc/c/ChangeLog:

* c-typeck.cc (convert_for_assignment): Check for null pointer
before warning about an incompatible scalar storage order.

gcc/testsuite/ChangeLog:

* gcc.dg/sso-18.c: New test.
* gcc.dg/sso-19.c: New test.

ABOUT-GCC-NLS: add usage guidance

gcc/ChangeLog:

* ABOUT-GCC-NLS: Add usage guidance.

diagnostic: rename new permerror overloads

While checking another change, I noticed that the new permerror overloads
break gettext with "permerror used incompatibly as both
--keyword=permerror:2 --flag=permerror:2:gcc-internal-format and
--keyword=permerror:3 --flag=permerror:3:gcc-internal-format". So let's
change the name.

gcc/ChangeLog:

* diagnostic-core.h (permerror): Rename new overloads...
(permerror_opt): To this.
* diagnostic.cc: Likewise.

gcc/cp/ChangeLog:

* typeck2.cc (check_narrowing): Adjust.

c++: use G_ instead of _

Since these strings are passed to error_at, they should be marked for
translation with G_, like other diagnostic messages, rather than _, which
forces immediate (redundant) translation. The use of N_ is less
problematic, but also imprecise.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_primary_expression): Use G_.
(cp_parser_using_enum): Likewise.
* decl.cc (identify_goto): Likewise.

ada: Support new SPARK aspect Side_Effects

SPARK RM 6.1.11 introduces a new aspect Side_Effects to denote
those functions which may have output parameters, write global
variables, raise exceptions and not terminate. This adds support
for this aspect and the corresponding pragma in the frontend.

Handling of this aspect in the frontend is very similar to
the handling of aspect Extensions_Visible: both are Boolean
aspects whose expression should be static, they can be specified
on the same entities, with the same rule of inheritance from
overridden to overriding primitives for tagged types.

There is no impact on code generation.

gcc/ada/

* aspects.ads: Add aspect Side_Effects.
* contracts.adb (Add_Pre_Post_Condition)
(Inherit_Subprogram_Contract): Add support for new contract.
* contracts.ads: Update comments.
* einfo-utils.adb (Get_Pragma): Add support.
* einfo-utils.ads (Prag): Update comment.
* errout.ads: Add explain codes.
* par-prag.adb (Prag): Add support.
* sem_ch13.adb (Analyze_Aspect_Specifications)
(Check_Aspect_At_Freeze_Point): Add support.
* sem_ch6.adb (Analyze_Subprogram_Body_Helper)
(Analyze_Subprogram_Declaration): Call new analysis procedure to
check SPARK legality rules.
(Analyze_SPARK_Subprogram_Specification): New procedure to check
SPARK legality rules. Use an explain code for the error.
(Analyze_Subprogram_Specification): Move checks to new subprogram.
This code was effectively dead, as the kind for parameters was set
to E_Void at this point to detect early references.
* sem_ch6.ads (Analyze_Subprogram_Specification): Add new
procedure.
* sem_prag.adb (Analyze_Depends_In_Decl_Part)
(Analyze_Global_In_Decl_Part): Adapt legality check to apply only
to functions without side-effects.
(Analyze_If_Present): Extract functionality in new procedure
Analyze_If_Present_Internal.
(Analyze_If_Present_Internal): New procedure to analyze given
pragma kind.
(Analyze_Pragmas_If_Present): New procedure to analyze given
pragma kind associated with a declaration.
(Analyze_Pragma): Adapt support for Always_Terminates and
Exceptional_Cases. Add support for Side_Effects. Make sure to call
Analyze_If_Present to ensure pragma Side_Effects is analyzed prior
to analyzing pragmas Global and Depends. Use explain codes for the
errors.
* sem_prag.ads (Analyze_Pragmas_If_Present): Add new procedure.
* sem_util.adb (Is_Function_With_Side_Effects): New query function
to determine if a function is a function with side-effects.
* sem_util.ads (Is_Function_With_Side_Effects): Same.
* snames.ads-tmpl: Declare new names for pragma and aspect.
* doc/gnat_rm/implementation_defined_aspects.rst: Document new aspect.
* doc/gnat_rm/implementation_defined_pragmas.rst: Document new pragma.
* gnat_rm.texi: Regenerate.

ada: Refactor code to remove GNATcheck violation

Rewrite for loop containing an exit (which violates GNATcheck
rule Exits_From_Conditional_Loops), to use a while loop
which contains the exit criteria in its condition.
Also, move special case of first time through loop, to come
before loop.

gcc/ada/

* libgnat/s-imagef.adb (Set_Image_Fixed): Refactor loop.

ada: Add pragma Annotate for GNATcheck exemptions

Exempt the GNATcheck rule "Unassigned_OUT_Parameters"
with the rationale "the OUT parameter is assigned by component".

gcc/ada/

* libgnat/s-imguti.adb (Set_Decimal_Digits): Add pragma to exempt
Unassigned_OUT_Parameters.
(Set_Floating_Invalid_Value): Likewise

ada: Document gnatbind -Q switch

Add documentation for the -Q gnatbind switch in GNAT User's Guide and
improve gnatbind's help output for the switch to emphasize that it adds the
requested number of stacks to the secondary stack pool generated by the
binder.

gcc/ada/

* bindusg.adb (Display): Make it clear -Q adds to the number of
secondary stacks generated by the binder.
* doc/gnat_ugn/building_executable_programs_with_gnat.rst:
Document the -Q gnatbind switch and fix references to old
runtimes.
* gnat-style.texi: Regenerate.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

ada: Seize opportunity to reuse List_Length

This patch is intended as a readability improvement. It doesn't
change the behavior of the compiler.

gcc/ada/

* sem_ch3.adb (Constrain_Array): Replace manual list length
computation by call to List_Length.

ada: Simplify "not Present" with "No"

gcc/ada/

* exp_aggr.adb (Expand_Container_Aggregate): Simplify with "No".

c++: Make -Wunknown-pragmas controllable by #pragma GCC diagnostic [PR89038]

As noted on the PR, commit r13-1544, the fix for PR53431, did not handle
the specific case of -Wunknown-pragmas, because that warning is issued
during preprocessing, but not by libcpp directly (it comes from the
cb_def_pragma callback). Address that by handling this pragma in
addition to libcpp pragmas during the early pragma handler.

gcc/c-family/ChangeLog:

PR c++/89038
* c-pragma.cc (handle_pragma_diagnostic_impl): Handle
-Wunknown-pragmas during early processing.

gcc/testsuite/ChangeLog:

PR c++/89038
* c-c++-common/cpp/Wunknown-pragmas-1.c: New test.

libcpp: testsuite: Add test for fixed _Pragma bug [PR82335]

This PR was fixed by r12-4797 and r12-5454. Add test coverage from the PR
that is not represented elsewhere.

gcc/testsuite/ChangeLog:

PR preprocessor/82335
* c-c++-common/cpp/diagnostic-pragma-3.c: New test.

middle-end: don't create LC-SSA PHI variables for PHI nodes who dominate loop

As the testcase shows, when a PHI node dominates the loop there is no new
definition inside the loop.  As such there would be no PHI nodes to update.

When we maintain LCSSA form we create an intermediate node in between the two
loops to thread alongt the value.  However later on when we update the second
loop we don't have any PHI nodes to update and so adjust_phi_and_debug_stmts
does nothing.   This leaves us with an incorrect phi node.  Normally this does
nothing and just gets ignored.  But in the case of the vUSE chain we end up
corrupting the chain.

As such whenever a PHI node's argument dominates the loop, we should remove
the newly created PHI node after edge redirection.

The one exception to this is when the loop has been versioned.  In such cases
the versioned loop may not use the value but the second loop can.

When this happens and we add the loop guard unless the join block has the PHI
it can't find the original value for use inside the guard block.

The next refactoring in the series moves the formation of the guard block
inside peeling itself.  Here we have all the information and wouldn't
need to re-create it later.

gcc/ChangeLog:

PR tree-optimization/111860
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Remove PHI nodes that dominate loop.

gcc/testsuite/ChangeLog:

PR tree-optimization/111860
* gcc.dg/vect/pr111860.c: New test.

tree-optimization/111131 - SLP for non-IFN gathers

The following implements SLP vectorization support for gathers
without relying on IFNs being pattern detected (and supported by
the target). That includes support for emulated gathers but also
the legacy x86 builtin path.

PR tree-optimization/111131
* tree-vect-loop.cc (update_epilogue_loop_vinfo): Make
sure to update all gather/scatter stmt DRs, not only those
that eventually got VMAT_GATHER_SCATTER set.
* tree-vect-slp.cc (_slp_oprnd_info::first_gs_info): Add.
(vect_get_and_check_slp_defs): Handle gathers/scatters,
adding the offset as SLP operand and comparing base and scale.
(vect_build_slp_tree_1): Handle gathers.
(vect_build_slp_tree_2): Likewise.

* gcc.dg/vect/vect-gather-1.c: Now expected to vectorize
everywhere.
* gcc.dg/vect/vect-gather-2.c: Expected to not SLP anywhere.
Massage the scale case to more reliably produce a different
one. Scan for the specific messages.
* gcc.dg/vect/vect-gather-3.c: Masked gather is also supported
for AVX2, but not emulated.
* gcc.dg/vect/vect-gather-4.c: Expected to not SLP anywhere.
Massage to more properly ensure this.
* gcc.dg/vect/tsvc/vect-tsvc-s353.c: Expect to vectorize
everywhere.

Refactor x86 vectorized gather path

The following moves the builtin decl gather vectorization path along
the internal function and emulated gather vectorization paths,
simplifying the existing function down to generating the call and
required conversions to the actual argument types.  This thereby
exposes the unique support of two times larger number of offset
or data vector lanes.  It also makes the code path handle SLP
in principle (but SLP build needs adjustments for this, patch coming).

* tree-vect-stmts.cc (vect_build_gather_load_calls): Rename
to ...
(vect_build_one_gather_load_call): ... this.  Refactor,
inline widening/narrowing support ...
(vectorizable_load): ... here, do gather vectorization
with builtin decls along other gather vectorization.

aarch64: Generalise TFmode load/store pair patterns

This patch generalises the TFmode load/store pair patterns to TImode and
TDmode.  This brings them in line with the DXmode patterns, and uses the
same technique with separate mode iterators (TX and TX2) to allow for
distinct modes in each arm of the load/store pair.

For example, in combination with the post-RA load/store pair fusion pass
in the following patch, this improves the codegen for the following
varargs testcase involving TImode stores:

void g(void *);
int foo(int x, ...)
{
    __builtin_va_list ap;
    __builtin_va_start (ap, x);
    g(&ap);
    __builtin_va_end (ap);
}

from:

foo:
.LFB0:
stp x29, x30, [sp, -240]!
.LCFI0:
mov w9, -56
mov w8, -128
mov x29, sp
add x10, sp, 176
stp x1, x2, [sp, 184]
add x1, sp, 240
add x0, sp, 16
stp x1, x1, [sp, 16]
str x10, [sp, 32]
stp w9, w8, [sp, 40]
str q0, [sp, 48]
str q1, [sp, 64]
str q2, [sp, 80]
str q3, [sp, 96]
str q4, [sp, 112]
str q5, [sp, 128]
str q6, [sp, 144]
str q7, [sp, 160]
stp x3, x4, [sp, 200]
stp x5, x6, [sp, 216]
str x7, [sp, 232]
bl g
ldp x29, x30, [sp], 240
.LCFI1:
ret

to:

foo:
.LFB0:
stp x29, x30, [sp, -240]!
.LCFI0:
mov w9, -56
mov w8, -128
mov x29, sp
add x10, sp, 176
stp x1, x2, [sp, 1bd4971b7c71e70a637a1dq84]
add x1, sp, 240
add x0, sp, 16
stp x1, x1, [sp, 16]
str x10, [sp, 32]
stp w9, w8, [sp, 40]
stp q0, q1, [sp, 48]
stp q2, q3, [sp, 80]
stp q4, q5, [sp, 112]
stp q6, q7, [sp, 144]
stp x3, x4, [sp, 200]
stp x5, x6, [sp, 216]
str x7, [sp, 232]
bl g
ldp x29, x30, [sp], 240
.LCFI1:
ret

Note that this patch isn't neeed if we only use the mode
canonicalization approach in the new ldp fusion pass (since we
canonicalize T{I,F,D}mode to V16QImode), but we seem to get slightly
better performance with mode canonicalization disabled (see
--param=aarch64-ldp-canonicalize-modes in the following patch).

gcc/ChangeLog:

* config/aarch64/aarch64.md (load_pair_dw_tftf): Rename to ...
(load_pair_dw_<TX:mode><TX2:mode>): ... this.
(store_pair_dw_tftf): Rename to ...
(store_pair_dw_<TX:mode><TX2:mode>): ... this.
* config/aarch64/iterators.md (TX2): New.

aarch64, testsuite: Fix up pr71727.c

The test is trying to check that we don't use q-register stores with
-mstrict-align, so actually check specifically for that.

This is a prerequisite to avoid regressing:

scan-assembler-not "add\tx0, x0, :"

with the upcoming ldp fusion pass, as we change where the ldps are
formed such that a register is used rather than a symbolic (lo_sum)
address for the first load.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr71727.c: Adjust scan-assembler-not to
make sure we don't have q-register stores with -mstrict-align.

aarch64, testsuite: Tweak sve/pcs/args_9.c to allow stps

With the new ldp/stp pass enabled, there is a change in the codegen for
this test as follows:

        add     x8, sp, 16
        ptrue   p3.h, mul3
        str     p3, [x8]
-       str     x8, [sp, 8]
-       str     x9, [sp]
+       stp     x9, x8, [sp]
        ptrue   p3.d, vl8
        ptrue   p2.s, vl7
        ptrue   p1.h, vl6

i.e. we now form an stp that we were missing previously. This patch
adjusts the scan-assembler such that it should pass whether or not
we form the stp.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/pcs/args_9.c: Adjust scan-assemblers to
allow for stp.

aarch64, testsuite: Prevent stp in lr_free_1.c

The test is looking for individual stores which are able to be merged
into stp instructions. The test currently passes -fno-schedule-fusion
-fno-peephole2, presumably to prevent these stores from being turned
into stps, but this is no longer sufficient with the new ldp/stp fusion
pass.

As such, we add --param=aarch64-stp-policy=never to prevent stps being
formed.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/lr_free_1.c: Add
--param=aarch64-stp-policy=never to dg-options.

rtl-ssa: Support inferring uses of mem in change_insns

Currently, rtl_ssa::change_insns requires all new uses and defs to be
specified explicitly.  This turns out to be rather inconvenient for
forming load pairs in the new aarch64 load pair pass, as the pass has to
determine which mem def the final load pair consumes, and then obtain or
create a suitable use (i.e. significant bookkeeping, just to keep the
RTL-SSA IR consistent).  It turns out to be much more convenient to
allow change_insns to infer which def is consumed and create a suitable
use of mem itself.  This patch does that.

gcc/ChangeLog:

* rtl-ssa/changes.cc (function_info::finalize_new_accesses): Add new
parameter to give final insn position, infer use of mem if it isn't
specified explicitly.
(function_info::change_insns): Pass down final insn position to
finalize_new_accesses.
* rtl-ssa/functions.h: Add parameter to finalize_new_accesses.

rtl-ssa: Add entry point to allow re-parenting uses

This is needed by the upcoming aarch64 load pair pass, as it can
re-order stores (when alias analysis determines this is safe) and thus
change which mem def a given use consumes (in the RTL-SSA view, there is
no alias disambiguation of memory).

gcc/ChangeLog:

* rtl-ssa/accesses.cc (function_info::reparent_use): New.
* rtl-ssa/functions.h (function_info): Declare new member
function reparent_use.

rtl-ssa: Add drop_memory_access helper

Add a helper routine to access-utils.h which removes the memory access
from an access_array, if it has one.

gcc/ChangeLog:

* rtl-ssa/access-utils.h (drop_memory_access): New.

rtl-ssa: Fix bug in function_info::add_insn_after

In the case that !insn->is_debug_insn () && next->is_debug_insn (), this
function was missing an update of the prev pointer on the first nondebug
insn following the sequence of debug insns starting at next.

This can lead to corruption of the insn chain, in that we end up with:

insn->next_any_insn ()->prev_any_insn () != insn

in this case. This patch fixes that.

gcc/ChangeLog:

* rtl-ssa/insns.cc (function_info::add_insn_after): Ensure we
update the prev pointer on the following nondebug insn in the
case that !insn->is_debug_insn () && next->is_debug_insn ().

x86: Correct ISA enabled for clients since Arrow Lake

gcc/ChangeLog:

* config/i386/i386.h: Correct the ISA enabled for Arrow Lake.
Also make Clearwater Forest depends on Sierra Forest.
* config/i386/i386-options.cc: Revise the order of the macro
definition to avoid confusion.
* doc/extend.texi: Revise documentation.
* doc/invoke.texi: Correct documentation.

gcc/testsuite/ChangeLog:

* gcc.target/i386/funcspec-56.inc: Group Clearwater Forest
with atom cores.

amdgcn: deprecate Fiji device and multilib

LLVM wants to remove it, which breaks our build. This patch means that
most users won't notice that change, when it comes, and those that do will
have chosen to enable Fiji explicitly.

I'm selecting gfx900 as the new default as that's the least likely for users
to want, which means most users will specify -march explicitly, which means
we'll be free to change the default again, when we need to, without breaking
anybody's makefiles.

gcc/ChangeLog:

* config.gcc (amdgcn): Switch default to --with-arch=gfx900.
Implement support for --with-multilib-list.
* config/gcn/t-gcn-hsa: Likewise.
* doc/install.texi: Likewise.
* doc/invoke.texi: Mark Fiji deprecated.

LoongArch:Implement the new vector cost model framework.

This patch make loongarch use the new vector hooks and implements the costing
function determine_suggested_unroll_factor, to make it be able to suggest the
unroll factor for a given loop being vectorized base vec_ops analysis during
vector costing and the available issue information. Referring to aarch64 and
rs6000 port.

The patch also reduces the cost of unaligned stores, making it equal to the
cost of aligned ones in order to avoid odd alignment peeling.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_vector_costs): Inherit from
vector_costs. Add a constructor.
(loongarch_vector_costs::add_stmt_cost): Use adjust_cost_for_freq to
adjust the cost for inner loops.
(loongarch_vector_costs::count_operations): New function.
(loongarch_vector_costs::determine_suggested_unroll_factor): Ditto.
(loongarch_vector_costs::finish_cost): Ditto.
(loongarch_builtin_vectorization_cost): Adjust.
* config/loongarch/loongarch.opt (loongarch-vect-unroll-limit): New parameter.
(loongarcg-vect-issue-info): Ditto.
(mmemvec-cost): Delete.
* config/loongarch/genopts/loongarch.opt.in
(loongarch-vect-unroll-limit): Ditto.
(loongarcg-vect-issue-info): Ditto.
(mmemvec-cost): Delete.
* doc/invoke.texi (loongarcg-vect-unroll-limit): Document new option.

LoongArch:Implement vec_widen standard names.

Add support for vec_widen lo/hi patterns. These do not directly
match on Loongarch lasx instructions but can be emulated with
even/odd + vector merge.

gcc/ChangeLog:

* config/loongarch/lasx.md
(vec_widen_<su>mult_even_v8si): New patterns.
(vec_widen_<su>add_hi_<mode>): Ditto.
(vec_widen_<su>add_lo_<mode>): Ditto.
(vec_widen_<su>sub_hi_<mode>): Ditto.
(vec_widen_<su>sub_lo_<mode>): Ditto.
(vec_widen_<su>mult_hi_<mode>): Ditto.
(vec_widen_<su>mult_lo_<mode>): Ditto.
* config/loongarch/loongarch.md (u_bool): New iterator.
* config/loongarch/loongarch-protos.h
(loongarch_expand_vec_widen_hilo): New prototype.
* config/loongarch/loongarch.cc
(loongarch_expand_vec_interleave): New function.
(loongarch_expand_vec_widen_hilo): New function.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-widen-add.c: New test.
* gcc.target/loongarch/vect-widen-mul.c: New test.
* gcc.target/loongarch/vect-widen-sub.c: New test.

LoongArch:Implement avg and sad standard names.

gcc/ChangeLog:

* config/loongarch/lasx.md
(avg<mode>3_ceil): New patterns.
(uavg<mode>3_ceil): Ditto.
(avg<mode>3_floor): Ditto.
(uavg<mode>3_floor): Ditto.
(usadv32qi): Ditto.
(ssadv32qi): Ditto.
* config/loongarch/lsx.md
(avg<mode>3_ceil): New patterns.
(uavg<mode>3_ceil): Ditto.
(avg<mode>3_floor): Ditto.
(uavg<mode>3_floor): Ditto.
(usadv16qi): Ditto.
(ssadv16qi): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/avg-ceil-lasx.c: New test.
* gcc.target/loongarch/avg-ceil-lsx.c: New test.
* gcc.target/loongarch/avg-floor-lasx.c: New test.
* gcc.target/loongarch/avg-floor-lsx.c: New test.
* gcc.target/loongarch/sad-lasx.c: New test.
* gcc.target/loongarch/sad-lsx.c: New test.

Daily bump.

Fix expansion of `(a & 2) != 1`

I had a thinko in r14-1600-ge60593f3881c72a96a3fa4844d73e8a2cd14f670
where we would remove the `& CST` part if we ended up not calling
expand_single_bit_test.
This fixes the problem by introducing a new variable that will be used
for calling expand_single_bit_test.
As afar as I know this can only show up when disabling optimization
passes as this above form would have been optimized away.

Committed as obvious after a bootstrap/test on x86_64-linux-gnu.

PR middle-end/111863

gcc/ChangeLog:

* expr.cc (do_store_flag): Don't over write arg0
when stripping off `& POW2`.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr111863-1.c: New test.

[c] Fix PR 101364: ICE after error due to diagnose_arglist_conflict not checking for error

When checking to see if we have a function declaration has a conflict due to
promotations, there is no test to see if the type was an error mark and then calls
c_type_promotes_to. c_type_promotes_to is not ready for error_mark and causes an
ICE.

This adds a check for error before the call of c_type_promotes_to.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR c/101364

gcc/c/ChangeLog:

* c-decl.cc (diagnose_arglist_conflict): Test for
error mark before calling of c_type_promotes_to.

gcc/testsuite/ChangeLog:

* gcc.dg/pr101364-1.c: New test.

Fix ICE due to c_safe_arg_type_equiv_p not checking for error_mark node

This is a simple error recovery issue when c_safe_arg_type_equiv_p
was added in r8-5312-gc65e18d3331aa999. The issue is that after
an error, an argument type (of a function type) might turn
into an error mark node and c_safe_arg_type_equiv_p was not ready
for that. So this just adds a check for error operand for its
arguments before getting the main variant.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR c/101285

gcc/c/ChangeLog:

* c-typeck.cc (c_safe_arg_type_equiv_p): Return true for error
operands early.

gcc/testsuite/ChangeLog:

* gcc.dg/pr101285-1.c: New test.

PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.

gcc/ChangeLog:
PR tree-optimization/111648
* fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): If a1
chooses base element from arg, ensure that it's a natural stepped
sequence.
(build_vec_cst_rand): New param natural_stepped and use it to
construct a naturally stepped sequence.
(test_nunits_min_2): Add new unit tests Case 6 and Case 7.

pru: Implement TARGET_INSN_COST

This patch slightly improves the embench-iot benchmark score for
PRU code size.  There is also small improvement in a few real-world
firmware programs.

  Embench-iot size
  ------------------------------------------
  Benchmark          before   after    delta
  ---------           ----    ----     -----
  aha-mont64          4.15    4.15         0
  crc32               6.04    6.04         0
  cubic              21.64   21.62     -0.02
  edn                 6.37    6.37         0
  huffbench          18.63   18.55     -0.08
  matmult-int         5.44    5.44         0
  md5sum             25.56   25.43     -0.13
  minver             12.82   12.76     -0.06
  nbody              15.09   14.97     -0.12
  nettle-aes          4.75    4.75         0
  nettle-sha256       4.67    4.67         0
  nsichneu            3.77    3.77         0
  picojpeg            4.11    4.11         0
  primecount          7.90    7.90         0
  qrduino             7.18    7.16     -0.02
  sglib-combined     13.63   13.59     -0.04
  slre                5.19    5.19         0
  st                 14.23   14.12     -0.11
  statemate           2.34    2.34         0
  tarfind            36.85   36.64     -0.21
  ud                 10.51   10.46     -0.05
  wikisort            7.44    7.41     -0.03
  ---------          -----   -----
  Geometric mean      8.42    8.40     -0.02
  Geometric SD        2.00    2.00         0
  Geometric range    12.68   12.62     -0.06

gcc/ChangeLog:

* config/pru/pru.cc (pru_insn_cost): New function.
(TARGET_INSN_COST): Define for PRU.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

LibF7: Implement mul_mant for devices without MUL instruction.

libgcc/config/avr/libf7/
* libf7-asm.sx (mul_mant): Implement for devices without MUL.
* asm-defs.h (wmov) [!HAVE_MUL]: Fix regno computation.
* t-libf7 (F7_ASM_FLAGS): Add -g0.

aarch64: Replace duplicated selftests

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_test_fractional_cost):
Test <= instead of testing < twice.

cse: Workaround GCC < 5 bug in cse_insn [PR111852]

Before the r5-3834 commit for PR63362, GCC 4.8-4.9 refuses to compile
cse.cc which contains a variable with rtx_def type, because rtx_def
contains a union with poly_uint16 element. poly_int template has
defaulted default constructor and a variadic template constructor which
could have empty parameter pack. GCC < 5 treated it as non-trivially
constructible class and deleted rtunion and rtx_def default constructors.

For the cse_insn purposes, all we need is a variable with size and alignment
of rtx_def, not necessarily rtx_def itself, which we then memset to 0 and
fill in like rtx is normally allocated from heap, so this patch for
GCC_VERSION < 5000 uses an unsigned char array of the right size/alignment.

2023-10-18 Jakub Jelinek <jakub@redhat.com>

PR bootstrap/111852
* cse.cc (cse_insn): Add workaround for GCC 4.8-4.9, instead of
using rtx_def type for memory_extend_buf, use unsigned char
arrayy with size of rtx_def and its alignment.

diagnostic: add permerror variants with opt

In the discussion of promoting some pedwarns to be errors by default, rather
than move them all into -fpermissive it seems to me to make sense to support
DK_PERMERROR with an option flag. This way will also work with
-fpermissive, but users can also still use -Wno-error=narrowing to downgrade
that specific diagnostic rather than everything affected by -fpermissive.

So, for diagnostics that we want to make errors by default we can just
change the pedwarn call to permerror.

The tests check desired behavior for such a permerror in a system header
with various flags. The patch preserves the existing permerror behavior of
ignoring -w and system headers by default, but respecting them when
downgraded to a warning by -fpermissive.

This seems similar to but a bit better than the approach of forcing
-pedantic-errors that I previously used for -Wnarrowing: specifically, in
that now -w by itself is not enough to silence the -Wnarrowing
error (integer-pack2.C).

gcc/ChangeLog:

* doc/invoke.texi: Move -fpermissive to Warning Options.
* diagnostic.cc (update_effective_level_from_pragmas): Remove
redundant system header check.
(diagnostic_report_diagnostic): Move down syshdr/-w check.
(diagnostic_impl): Handle DK_PERMERROR with an option number.
(permerror): Add new overloads.
* diagnostic-core.h (permerror): Declare them.

gcc/cp/ChangeLog:

* typeck2.cc (check_narrowing): Use permerror.

gcc/testsuite/ChangeLog:

* g++.dg/ext/integer-pack2.C: Add -fpermissive.
* g++.dg/diagnostic/sys-narrow.h: New test.
* g++.dg/diagnostic/sys-narrow1.C: New test.
* g++.dg/diagnostic/sys-narrow1a.C: New test.
* g++.dg/diagnostic/sys-narrow1b.C: New test.
* g++.dg/diagnostic/sys-narrow1c.C: New test.
* g++.dg/diagnostic/sys-narrow1d.C: New test.
* g++.dg/diagnostic/sys-narrow1e.C: New test.
* g++.dg/diagnostic/sys-narrow1f.C: New test.
* g++.dg/diagnostic/sys-narrow1g.C: New test.
* g++.dg/diagnostic/sys-narrow1h.C: New test.
* g++.dg/diagnostic/sys-narrow1i.C: New test.

OpenMP: Avoid ICE with LTO and 'omp allocate'

gcc/ChangeLog:

* gimplify.cc (gimplify_bind_expr): Remove "omp allocate" attribute
to avoid that auxillary statement list reaches LTO.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-13a.f90: New test.

tree-ssa-math-opts: Fix up match_uaddc_usubc [PR111845]

GCC ICEs on the first testcase.  Successful match_uaddc_usubc ends up with
some dead stmts which DCE will remove (hopefully) later all.
The ICE is because one of the dead stmts refers to a freed SSA_NAME.
The code already gsi_removes a couple of stmts in the
  /* Remove some statements which can't be kept in the IL because they
     use SSA_NAME whose setter is going to be removed too.  */
section for the same reason (the reason for the freed SSA_NAMEs is that
we don't really have a replacement for those cases - all we have after
a match is combined overflow from the addition/subtraction of 2 operands + a
[0, 1] carry in, but not the individual overflows from the former 2
additions), but for the last (most significant) limb case, where we try
to match x = op1 + op2 + carry1 + carry2; or
x = op1 - op2 - carry1 - carry2; we just gsi_replace the final stmt, but
left around the 2 temporary stmts as dead; if we were unlucky enough that
those referenced the carry flag that went away, it ICEs.

So, the following patch remembers those temporary statements (rather than
trying to rediscover them more expensively) and removes them before the
final one is replaced.

While working on it, I've noticed we didn't support all the reassociated
possibilities of writing the addition of 4 operands or subtracting 3
operands from one, we supported e.g.
x = ((op1 + op2) + op3) + op4;
x = op1 + ((op2 + op3) + op4);
but not
x = (op1 + (op2 + op3)) + op4;
x = op1 + (op2 + (op3 + op4));
Fixed by the change to inspect also rhs[2] when rhs[1] didn't yield what
we were searching for (if non-NULL) - rhs[0] is inspected in the first
loop and has different handling for the MINUS_EXPR case.

2023-10-18  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/111845
* tree-ssa-math-opts.cc (match_uaddc_usubc): Remember temporary
statements for the 4 operand addition or subtraction of 3 operands
from 1 operand cases and remove them when successful.  Look for
nested additions even from rhs[2], not just rhs[1].

* gcc.dg/pr111845.c: New test.
* gcc.target/i386/pr111845.c: New test.

nvptx: Use fatal_error when -march= is missing not an assert [PR111093]

gcc/ChangeLog:

PR target/111093
* config/nvptx/nvptx.cc (nvptx_option_override): Issue fatal error
instead of an assert ICE when no -march= has been specified.

Darwin: Check as for .build_version support and use it if available.

This adds support for the minimum OS version data in assembler files.
At present, we have no mechanism to detect the SDK version in use, and
so that is omitted from build_versions.

We follow the implementation in clang, '.build_version' is only emitted
(where supported) for target macOS versions >= 10.14. For earlier macOS
we fall back to using a '.macosx_version_min' directive. This latter is
also emitted when the assembler supports it, but not build_version.

gcc/ChangeLog:

* config.in: Regenerate.
* config/darwin.cc (darwin_file_start): Add assembler directives
for the target OS version, where these are supported by the
assembler.
(darwin_override_options): Check for building >= macOS 10.14.
* configure: Regenerate.
* configure.ac: Check for assembler support of .build_version
directives.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

ifcvt: rewrite args handling to remove lookups

This refactors the code to remove the args cache and index lookups
in favor of a single structure. It also again, removes the use of
std::sort as previously requested but avoids the new asserts in
trunk.

gcc/ChangeLog:

PR tree-optimization/109154
* tree-if-conv.cc (INCLUDE_ALGORITHM): Remove.
(typedef struct ifcvt_arg_entry): New.
(cmp_arg_entry): New.
(gen_phi_arg_condition, gen_phi_nest_statement,
predicate_scalar_phi): Use them.

AArch64: Rewrite simd move immediate patterns to new syntax

This rewrites the simd MOV patterns to use the new compact syntax.
No change in semantics is expected. This will be needed in follow on patches.

This also merges the splits into the define_insn which will also be needed soon.

gcc/ChangeLog:

PR tree-optimization/109154
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>):
Rewrite to new syntax.
(*aarch64_simd_mov<VQMOV:mode): Rewrite to new syntax and merge in
splits.

middle-end: ifcvt: Allow any const IFN in conditional blocks

When ifcvt was initially added masking was not a thing and as such it was
rather conservative in what it supported.

For builtins it only allowed C99 builtin functions which it knew it can fold
away.

These days the vectorizer is able to deal with needing to mask IFNs itself.
vectorizable_call is able vectorize the IFN by emitting a VEC_PERM_EXPR after
the operation to emulate the masking.

This is then used by match.pd to conver the IFN into a masked variant if it's
available.

For these reasons the restriction in ifconvert is no longer require and we
needless block vectorization when we can effectively handle the operations.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Note: This patch is part of a testseries and tests for it are added in the
AArch64 patch that adds supports for the optab.

gcc/ChangeLog:

PR tree-optimization/109154
* tree-if-conv.cc (if_convertible_stmt_p): Allow any const IFN.

middle-end: Fold vec_cond into conditional ternary or binary operation when sharing operand [PR109154]

When we have a vector conditional on a masked target which is doing a selection
on the result of a conditional operation where one of the operands of the
conditional operation is the other operand of the select, then we can fold the
vector conditional into the operation.

Concretely this transforms

  c = mask1 ? (masked_op mask2 a b) : b

into

  c = masked_op (mask1 & mask2) a b

The mask is then propagated upwards by the compiler.  In the SVE case we don't
end up needing a mask AND here since `mask2` will end up in the instruction
creating `mask` which gives us a natural &.

Such transformations are more common now in GCC 13+ as PRE has not started
unsharing of common code in case it can make one branch fully independent.

e.g. in this case `b` becomes a loop invariant value after PRE.

This transformation removes the extra select for masked architectures but
doesn't fix the general case.

gcc/ChangeLog:

PR tree-optimization/109154
* match.pd: Add new cond_op rule.

gcc/testsuite/ChangeLog:

PR tree-optimization/109154
* gcc.target/aarch64/sve/pre_cond_share_1.c: New test.

LoongArch: Use fcmp.caf.s instead of movgr2cf for zeroing a fcc

During the review of an LLVM change [1], on LA464 we found that zeroing
an fcc with fcmp.caf.s is much faster than a movgr2cf from $r0.

[1]: https://github.com/llvm/llvm-project/pull/69300

gcc/ChangeLog:

* config/loongarch/loongarch.md (movfcc): Use fcmp.caf.s for
zeroing a fcc.

Re-instantiate integer mask to traditional vector mask support

The following allows to pass integer mask data as traditional
vector mask for OMP SIMD clone calls which is required due to
the limited set of OMP SIMD clones in the x86 ABI when using
AVX512 but a prefered vector size of 256 bits.

* tree-vect-stmts.cc (vectorizable_simd_clone_call):
Relax check to again allow passing integer mode masks
as traditional vectors.

middle-end: maintain LCSSA throughout loop peeling

This final patch updates peeling to maintain LCSSA all the way through.

It's significantly easier to maintain it during peeling while we still know
where all new edges connect rather than touching it up later as is currently
being done.

This allows us to remove many of the helper functions that touch up the loops
at various parts.  The only complication is for loop distribution where we
should be able to use the same,  however ldist depending on whether
redirect_lc_phi_defs is true or not will either try to maintain a limited LCSSA
form itself or removes are non-virtual phis.

The problem here is that if we maintain LCSSA then in some cases the blocks
connecting the two loops get PHIs to keep the loop IV up to date.

However there is no loop, the guard condition is rewritten as 0 != 0, to the
"loop" always exits.   However due to the PHI nodes the probabilities get
completely wrong.  It seems to think that the impossible exit is the likely
edge.  This causes incorrect warnings and the presence of the PHIs prevent the
blocks to be simplified.

While it may be possible to make ldist work with LCSSA form, doing so seems more
work than not.  For that reason the peeling code has an additional parameter
used by only ldist to not connect the two loops during peeling.

This preserves the current behaviour from ldist until I can dive into the
implementation more.  Hopefully that's ok for now.

gcc/ChangeLog:

* tree-loop-distribution.cc (copy_loop_before): Request no LCSSA.
* tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add additional
asserts.
(slpeel_tree_duplicate_loop_to_edge_cfg): Keep LCSSA during peeling.
(find_guard_arg): Look value up through explicit edge and original defs.
(vect_do_peeling): Use it.
(slpeel_update_phi_nodes_for_guard2): Take explicit exit edge.
(slpeel_update_phi_nodes_for_lcssa, slpeel_update_phi_nodes_for_loops):
Remove.
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Initialize phi.
* tree-vectorizer.h (slpeel_tree_duplicate_loop_to_edge_cfg): Add
optional param to turn off LCSSA mode.

middle-end: updated niters analysis to handle multiple exits.

This second part updates niters analysis to be able to analyze any number of
exits.  If we have multiple exits we determine the main exit by finding the
first counting IV.

The change allows the vectorizer to pass analysis for multiple loops, but we
later gracefully reject them.  It does however allow us to test if the exit
handling is using the right exit everywhere.

Additionally since we analyze all exits, we now return all conditions for them
and determine which condition belongs to the main exit.

The main condition is needed because the vectorizer needs to ignore the main IV
condition during vectorization as it will replace it during codegen.

To track versioned loops we extend the contract between ifcvt and the vectorizer
to store the exit number in aux so that we can match it up again during peeling.

gcc/ChangeLog:

* tree-if-conv.cc (tree_if_conversion): Record exits in aux.
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Use
it.
* tree-vect-loop.cc (vect_get_loop_niters): Determine main exit.
(vec_init_loop_exit_info): Extend analysis when multiple exits.
(vect_analyze_loop_form): Record conds and determine main cond.
(vect_create_loop_vinfo): Extend bookkeeping of conds.
(vect_analyze_loop): Release conds.
* tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS,
LOOP_VINFO_LOOP_IV_COND):  New.
(struct vect_loop_form_info): Add conds, alt_loop_conds;
(struct loop_vec_info): Add conds, loop_iv_cond.

middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables

This is extracted out of the patch series to support early break vectorization
in order to simplify the review of that patch series.

The goal of this one is to separate out the refactoring from the new
functionality.

This first patch separates out the vectorizer's definition of an exit to their
own values inside loop_vinfo.  During vectorization we can have three separate
copies for each loop: scalar, vectorized, epilogue.  The scalar loop can also be
the versioned loop before peeling.

Because of this we track 3 different exits inside loop_vinfo corresponding to
each of these loops.  Additionally each function that uses an exit, when not
obviously clear which exit is needed will now take the exit explicitly as an
argument.

This is because often times the callers switch the loops being passed around.
While the caller knows which loops it is, the callee does not.

For now the loop exits are simply initialized to same value as before determined
by single_exit (..).

No change in functionality is expected throughout this patch series.

gcc/ChangeLog:

* tree-loop-distribution.cc (copy_loop_before): Pass exit explicitly.
(loop_distribution::distribute_loop): Bail out of not single exit.
* tree-scalar-evolution.cc (get_loop_exit_condition): New.
* tree-scalar-evolution.h (get_loop_exit_condition): New.
* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment): Pass exit
explicitly.
* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors,
vect_set_loop_condition_partial_vectors_avx512,
vect_set_loop_condition_normal, vect_set_loop_condition): Explicitly
take exit.
(slpeel_tree_duplicate_loop_to_edge_cfg): Explicitly take exit and
return new peeled corresponding peeled exit.
(slpeel_can_duplicate_loop_p): Explicitly take exit.
(find_loop_location): Handle not knowing an explicit exit.
(vect_update_ivs_after_vectorizer, vect_gen_vector_loop_niters_mult_vf,
find_guard_arg, slpeel_update_phi_nodes_for_loops,
slpeel_update_phi_nodes_for_guard2): Use new exits.
(vect_do_peeling): Update bookkeeping to keep track of exits.
* tree-vect-loop.cc (vect_get_loop_niters): Explicitly take exit to
analyze.
(vec_init_loop_exit_info): New.
(_loop_vec_info::_loop_vec_info): Initialize vec_loop_iv,
vec_epilogue_loop_iv, scalar_loop_iv.
(vect_analyze_loop_form): Initialize exits.
(vect_create_loop_vinfo): Set main exit.
(vect_create_epilog_for_reduction, vectorizable_live_operation,
vect_transform_loop): Use it.
(scale_profile_for_vect_loop): Explicitly take exit to scale.
* tree-vectorizer.cc (set_uid_loop_bbs): Initialize loop exit.
* tree-vectorizer.h (LOOP_VINFO_IV_EXIT, LOOP_VINFO_EPILOGUE_IV_EXIT,
LOOP_VINFO_SCALAR_IV_EXIT): New.
(struct loop_vec_info): Add vec_loop_iv, vec_epilogue_loop_iv,
scalar_loop_iv.
(vect_set_loop_condition, slpeel_can_duplicate_loop_p,
slpeel_tree_duplicate_loop_to_edge_cfg): Take explicit exits.
(vec_init_loop_exit_info): New.
(struct vect_loop_form_info): Add loop_exit.

middle-end: refactor vectorizable_comparison to make the main body re-usable.

Vectorization of a gcond starts off essentially the same as vectorizing a
comparison witht he only difference being how the operands are extracted.

This refactors vectorable_comparison such that we now have a generic function
that can be used from vectorizable_early_break. The refactoring splits the
gassign checks and actual validation/codegen off to a helper function.

No change in functionality expected.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_comparison): Refactor, splitting body
to ...
(vectorizable_comparison_1): ...This.

RISC-V: Optimize consecutive permutation index pattern by vrgather.vi/vx

This patch optimize this following permutation with consecutive patterns index:

typedef char vnx16i __attribute__ ((vector_size (16)));

#define MASK_16 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15

vnx16i __attribute__ ((noinline, noclone))
test_1 (vnx16i x, vnx16i y)
{
  return __builtin_shufflevector (x, y, MASK_16);
}

Before this patch:

        lui     a5,%hi(.LC0)
        addi    a5,a5,%lo(.LC0)
        vsetivli        zero,16,e8,m1,ta,ma
        vle8.v  v3,0(a5)
        vle8.v  v2,0(a1)
        vrgather.vv     v1,v2,v3
        vse8.v  v1,0(a0)
        ret

After this patch:

vsetivli zero,16,e8,mf8,ta,ma
vle8.v v2,0(a1)
vsetivli zero,4,e32,mf2,ta,ma
vrgather.vi v1,v2,3
vsetivli zero,16,e8,mf8,ta,ma
vse8.v v1,0(a0)
ret

Overal reduce 1 instruction which is vector load instruction which is much more expansive
than VL toggling.

Also, with this patch, we are using vrgather.vi which reduce 1 vector register consumption.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_consecutive_patterns): New function.
(expand_vec_perm_const_1): Add consecutive pattern recognition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add new test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/consecutive-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/consecutive-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/consecutive-3.c: New test.

fortran/intrinsic.texi: Add 'intrinsic' to SIGNAL example

gcc/fortran/ChangeLog:

* intrinsic.texi (signal): Add 'intrinsic :: signal, sleep' to
the example to make it safer.

Initial Panther Lake Support

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_intel_cpu): Add Panther
Lake.
* common/config/i386/i386-common.cc (processor_name):
Ditto.
(processor_alias_table): Ditto.
* common/config/i386/i386-cpuinfo.h (enum processor_types):
Add INTEL_PANTHERLAKE.
* config.gcc: Add -march=pantherlake.
* config/i386/driver-i386.cc (host_detect_local_cpu): Refactor
the if clause. Handle pantherlake.
* config/i386/i386-c.cc (ix86_target_macros_internal):
Handle pantherlake.
* config/i386/i386-options.cc (processor_cost_table): Ditto.
(m_PANTHERLAKE): New.
(m_CORE_HYBRID): Add pantherlake.
* config/i386/i386.h (enum processor_type): Ditto.
* doc/extend.texi: Ditto.
* doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv16.C: Ditto.
* gcc.target/i386/funcspec-56.inc: Handle new march.

x86: Add m_CORE_HYBRID for hybrid clients tuning

gcc/Changelog:

* config/i386/i386-options.cc (m_CORE_HYBRID): New.
* config/i386/x86-tune.def: Replace hybrid client tune to
m_CORE_HYBRID.

Initial Clearwater Forest Support

gcc/ChangeLog:

* common/config/i386/cpuinfo.h
(get_intel_cpu): Handle Clearwater Forest.
* common/config/i386/i386-common.cc (processor_name):
Add Clearwater Forest.
(processor_alias_table): Ditto.
* common/config/i386/i386-cpuinfo.h (enum processor_types):
Add INTEL_CLEARWATERFOREST.
* config.gcc: Add -march=clearwaterforest.
* config/i386/driver-i386.cc (host_detect_local_cpu): Handle
clearwaterforest.
* config/i386/i386-c.cc (ix86_target_macros_internal): Ditto.
* config/i386/i386-options.cc (processor_cost_table): Ditto.
(m_CLEARWATERFOREST): New.
(m_CORE_ATOM): Add clearwaterforest.
* config/i386/i386.h (enum processor_type): Ditto.
* doc/extend.texi: Ditto.
* doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv16.C: Ditto.
* gcc.target/i386/funcspec-56.inc: Handle new march.

Support 32/64-bit vectorization for _Float16 fma related operations.

gcc/ChangeLog:

* config/i386/mmx.md (fma<mode>4): New expander.
(fms<mode>4): Ditto.
(fnma<mode>4): Ditto.
(fnms<mode>4): Ditto.
(vec_fmaddsubv4hf4): Ditto.
(vec_fmsubaddv4hf4): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-fmaddsubhf-1.c: New test.
* gcc.target/i386/part-vect-fmahf-1.c: New test.

RISC-V: Enable more tests for dynamic LMUL and bug fix[PR111832]

Last time, Robin has mentioned that dynamic LMUL will cause ICE in SPEC:

https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629992.html

which is caused by assertion FAIL.

When we enable more currents in rvv.exp with dynamic LMUL, such issue can be
reproduced and has a PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111832

Now, we enable more tests in rvv.exp in this patch and fix the bug.

PR target/111832

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (get_biggest_mode): New function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Enable more dynamic tests.

Daily bump.

aarch64: Put LR save slot first in more cases

Now that the prologue and epilogue code iterates over saved
registers in offset order, we can put the LR save slot first
without compromising LDP/STP formation.

This isn't worthwhile when shadow call stacks are enabled, since the
first two registers are also push/pop candidates, and LR cannot be
popped when shadow call stacks are enabled. (LR is instead loaded
first and compared against the shadow stack's value.)

But otherwise, it seems better to put the LR save slot first,
to reduce unnecessary variation with the layout for stack clash
protection.

gcc/
* config/aarch64/aarch64.cc (aarch64_layout_frame): Don't make
the position of the LR save slot dependent on stack clash
protection unless shadow call stacks are enabled.

gcc/testsuite/
* gcc.target/aarch64/test_frame_2.c: Expect x30 to come before x19.
* gcc.target/aarch64/test_frame_4.c: Likewise.
* gcc.target/aarch64/test_frame_7.c: Likewise.
* gcc.target/aarch64/test_frame_10.c: Likewise.

aarch64: Use vecs to store register save order

aarch64_save/restore_callee_saves looped over registers in register
number order.  This in turn meant that we could only use LDP and STP
for registers that were consecutive both number-wise and
offset-wise (after unsaved registers are excluded).

This patch instead builds lists of the registers that we've decided to
save, in offset order.  We can then form LDP/STP pairs regardless of
register number order, which in turn means that we can put the LR save
slot first without losing LDP/STP opportunities.

gcc/
* config/aarch64/aarch64.h (aarch64_frame): Add vectors that
store the list saved GPRs, FPRs and predicate registers.
* config/aarch64/aarch64.cc (aarch64_layout_frame): Initialize
the lists of saved registers.  Use them to choose push candidates.
Invalidate pop candidates if we're not going to do a pop.
(aarch64_next_callee_save): Delete.
(aarch64_save_callee_saves): Take a list of registers,
rather than a range.  Make !skip_wb select only write-back
candidates.
(aarch64_expand_prologue): Update calls accordingly.
(aarch64_restore_callee_saves): Take a list of registers,
rather than a range.  Always skip pop candidates.  Also skip
LR if shadow call stacks are enabled.
(aarch64_expand_epilogue): Update calls accordingly.

gcc/testsuite/
* gcc.target/aarch64/sve/pcs/stack_clash_2.c: Expect restores
to happen in offset order.
* gcc.target/aarch64/sve/pcs/stack_clash_2_128.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c: Likewise.

Handle epilogues that contain jumps

The prologue/epilogue pass allows the prologue sequence to contain
jumps.  The sequence is then partitioned into basic blocks using
find_many_sub_basic_blocks.

This patch treats epilogues in a similar way.  Since only one block
might need to be split, the patch (re)introduces a find_sub_basic_blocks
routine to handle a single block.

The new routine hard-codes the assumption that split_block will chain
the new block immediately after the original block.  The routine doesn't
try to replicate the fix for PR81030, since that was specific to
gimple->rtl expansion.

The patch is needed for follow-on aarch64 patches that add conditional
code to the epilogue.  The tests are part of those patches.

gcc/
* cfgbuild.h (find_sub_basic_blocks): Declare.
* cfgbuild.cc (update_profile_for_new_sub_basic_block): New function,
split out from...
(find_many_sub_basic_blocks): ...here.
(find_sub_basic_blocks): New function.
* function.cc (thread_prologue_and_epilogue_insns): Handle
epilogues that contain jumps.

ssa_name_has_boolean_range vs signed-boolean:31 types

This turns out to be a latent bug in ssa_name_has_boolean_range
where it would return true for all boolean types but all of the
uses of ssa_name_has_boolean_range was expecting 0/1 as the range
rather than [-1,0].
So when I fixed vector lower to do all comparisons in boolean_type
rather than still in the signed-boolean:31 type (to fix a different issue),
the pattern in match for `-(type)!A -> (type)A - 1.` would assume A (which
was signed-boolean:31) had a range of [0,1] which broke down and sometimes
gave us -1/-2 as values rather than what we were expecting of -1/0.

This was the simpliest patch I found while testing.

We have another way of matching [0,1] range which we could use instead
of ssa_name_has_boolean_range except that uses only the global ranges
rather than the local range (during VRP).
I tried to clean this up slightly by using gimple_match_zero_one_valuedp
inside ssa_name_has_boolean_range but that failed because due to using
only the global ranges. I then tried to change get_nonzero_bits to use
the local ranges at the optimization time but that failed also because
we would remove branches to __builtin_unreachable during evrp and lose
information as we don't set the global ranges during evrp.

OK? Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/110817

gcc/ChangeLog:

* tree-ssanames.cc (ssa_name_has_boolean_range): Remove the
check for boolean type as they don't have "[0,1]" range.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr110817-1.c: New test.
* gcc.c-torture/execute/pr110817-2.c: New test.
* gcc.c-torture/execute/pr110817-3.c: New test.

c++: accepts-invalid with =delete("") [PR111840]

r6-2367 added a DECL_INITIAL check to cp_parser_simple_declaration
so that we don't emit multiple errors in g++.dg/parse/error57.C.
But that means we don't diagnose

int f1() = delete("george_crumb");

anymore, because fn decls often have error_mark_node in their
DECL_INITIAL. (The code may be allowed one day via https://wg21.link/P2573R0.)

I was hoping I could use cp_parser_error_occurred but that would
regress error57.C.

PR c++/111840

gcc/cp/ChangeLog:

* parser.cc (cp_parser_simple_declaration): Do cp_parser_error
for FUNCTION_DECLs.

gcc/testsuite/ChangeLog:

* g++.dg/parse/error65.C: New test.

c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660]

My recent patch introducing cp_fold_immediate_r caused exponential
compile time with nested COND_EXPRs.  The problem is that the COND_EXPR
case recursively walks the arms of a COND_EXPR, but after processing
both arms it doesn't end the walk; it proceeds to walk the
sub-expressions of the outermost COND_EXPR, triggering again walking
the arms of the nested COND_EXPR, and so on.  This patch brings the
compile time down to about 0m0.030s.

The ff_fold_immediate flag is unused after this patch but since I'm
using it in the P2564 patch, I'm not removing it now.  Maybe at_eof
can be used instead and then we can remove ff_fold_immediate.

PR c++/111660

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_immediate_r) <case COND_EXPR>: Don't
handle it here.
(cp_fold_r): Handle COND_EXPR here.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/hog1.C: New test.
* g++.dg/cpp2a/consteval36.C: New test.

c++: mangling tweaks

Most of this is introducing the abi_check function to reduce the verbosity
of most places that check -fabi-version.

The start_mangling change is to avoid needing to zero-initialize additional
members of the mangling globals, though I'm not actually adding any.

The comment documents existing semantics.

gcc/cp/ChangeLog:

* mangle.cc (abi_check): New.
(write_prefix, write_unqualified_name, write_discriminator)
(write_type, write_member_name, write_expression)
(write_template_arg, write_template_param): Use it.
(start_mangling): Assign from {}.
* cp-tree.h: Update comment.

c++: Add missing auto_diagnostic_groups to constexpr.cc

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_dynamic_cast_fn): Add missing
auto_diagnostic_group.
(cxx_eval_call_expression): Likewise.
(diag_array_subscript): Likewise.
(outside_lifetime_error): Likewise.
(potential_constant_expression_1): Likewise.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Marek Polacek <polacek@redhat.com>

RISC-V/testsuite/pr111466.c: update test and expected output

Update the test to potentially generate two SEXT.W instructions: one for
incoming function arg, other for function return.

But after commit 8eb9cdd14218
("expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg")
the test is not supposed to generate either of them so fix the expected
assembler output which was errorneously introduced by commit above.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr111466.c (foo2): Change return to unsigned
int as that will potentially generate two SEXT.W instructions.
dg-final: Change to scan-assembler-not SEXT.W.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

c: error for function with external and internal linkage [PR111708]

Declaring a function with both external and internal linkage
in the same TU is translation-time UB. Add an error for this
case as already done for objects.

PR c/111708

gcc/c/ChangeLog:

* c-decl.cc (grokdeclarator): Add error.

gcc/testsuite/ChangeLog:

* gcc.dg/pr111708-1.c: New test.
* gcc.dg/pr111708-2.c: New test.

Fortran: out of bounds access with nested implied-do IO [PR111837]

gcc/fortran/ChangeLog:

PR fortran/111837
* frontend-passes.cc (traverse_io_block): Dependency check of loop
nest shall be triangular, not banded.

gcc/testsuite/ChangeLog:

PR fortran/111837
* gfortran.dg/implied_do_io_8.f90: New test.

fortran/intrinsic.texi: Improve SIGNAL intrinsic entry

gcc/fortran/ChangeLog:

* intrinsic.texi (signal): Mention that the argument
passed to the signal handler procedure is passed by reference.
Extend example.

MATCH: [PR111432] Simplify `a & (x | CST)` to a when we know that (a & ~CST) == 0

This adds the simplification `a & (x | CST)` to a when we know that
`(a & ~CST) == 0`. In a similar fashion as `a & CST` is handle.

I looked into handling `a | (x & CST)` but that I don't see any decent
simplifications happening.

OK? Bootstrapped and tested on x86_linux-gnu with no regressions.

PR tree-optimization/111432

gcc/ChangeLog:

* match.pd (`a & (x | CST)`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-7.c: New test.

LibF7: Re-generate f7-renames.h to pick up white-space from f7renames.sh.

libgcc/config/avr/libf7/
* f7-renames.h: Re-renerate.

tree-cfg: Add count information when creating new bb in move_sese_region_to_fn

This patch makes sure the profile_count information is initialized for the new
bb created in move_sese_region_to_fn.

gcc/ChangeLog:

* tree-cfg.cc (move_sese_region_to_fn): Initialize profile_count for
new basic block.

PR modula2/111756: Re-building all-gcc after source changes fails to link

When having modula-2 enabled in a development tree and there are any
changes that trigger rebuilds in m2/ doing a 'make all-gcc' in the
build directory might fail due to lack of dependency tracking.  This
patch introduces build dependencies into gcc/m2/Make-lang.in using -M*
options.  The patch also introduces all -M* options to cc1gm2 and gm2.

gcc/m2/ChangeLog:

PR modula2/111756
* Make-lang.in (CM2DEP): New define conditionally set if
($(CXXDEPMODE),depmode=gcc3).
(GM2_1): Use $(CM2DEP).
(m2/gm2-gcc/%.o): Ensure $(@D)/$(DEPDIR) is created.
Add $(CM2DEP) to the $(COMPILER) command and use $(POSTCOMPILE).
(m2/gm2-gcc/m2configure.o): Ditto.
(m2/gm2-lang.o): Ditto.
(m2/m2pp.o): Ditto.
(m2/gm2-gcc/rtegraph.o): Ditto.
(m2/mc-boot/$(SRC_PREFIX)%.o): Ditto.
(m2/mc-boot-ch/$(SRC_PREFIX)%.o): Ditto.
(m2/mc-boot-ch/$(SRC_PREFIX)%.o): Ditto.
(m2/mc-boot/main.o): Ditto.
(mcflex.o): Ditto.
(m2/gm2-libs-boot/M2RTS.o): Ditto.
(m2/gm2-libs-boot/%.o): Ditto.
(m2/gm2-libs-boot/%.o): Ditto.
(m2/gm2-libs-boot/RTcodummy.o): Ditto.
(m2/gm2-libs-boot/RTintdummy.o): Ditto.
(m2/gm2-libs-boot/wrapc.o): Ditto.
(m2/gm2-libs-boot/UnixArgs.o): Ditto.
(m2/gm2-libs-boot/choosetemp.o): Ditto.
(m2/gm2-libs-boot/errno.o): Ditto.
(m2/gm2-libs-boot/dtoa.o): Ditto.
(m2/gm2-libs-boot/ldtoa.o): Ditto.
(m2/gm2-libs-boot/termios.o): Ditto.
(m2/gm2-libs-boot/SysExceptions.o): Ditto.
(m2/gm2-libs-boot/SysStorage.o): Ditto.
(m2/gm2-compiler-boot/M2GCCDeclare.o): Ditto.
(m2/gm2-compiler-boot/M2Error.o): Ditto.
(m2/gm2-compiler-boot/%.o): Ditto.
(m2/gm2-compiler-boot/%.o): Ditto.
(m2/gm2-compiler-boot/m2flex.o): Ditto.
(m2/gm2-compiler/%.o): Ditto.
(m2/gm2-compiler/m2flex.o): Ditto.
(m2/gm2-libs-iso/%.o): Ditto.
(m2/gm2-libs/%.o): Ditto.
(m2/gm2-libs/%.o): Ditto.
(m2/gm2-libs/choosetemp.o): Ditto.
(m2/boot-bin/mklink$(exeext)): Ditto.
(m2/pge-boot/%.o): Ditto.
(m2/pge-boot/%.o): Ditto.
(m2/gm2-compiler/%.o): Ensure $(@D)/$(DEPDIR) is created and use
$(POSTCOMPILE).
(m2/gm2-compiler/%.o): Ditto.
(m2/gm2-libs-iso/%.o): Ditto.
(m2/gm2-libs/%.o): Ditto.
* README: Purge out of date info.
* gm2-compiler/M2Comp.mod (MakeSaveTempsFileNameExt): Import.
(OnExitDelete): Import.
(GetModuleDefImportStatementList): Import.
(GetModuleModImportStatementList): Import.
(GetImportModule): Import.
(IsImportStatement): Import.
(IsImport): Import.
(GetImportStatementList): Import.
(File): Import.
(Close): Import.
(EOF): Import.
(IsNoError): Import.
(WriteLine): Import.
(WriteChar): Import.
(FlushOutErr): Import.
(WriteS): Import.
(OpenToRead): Import.
(OpenToWrite): Import.
(ReadS): Import.
(WriteS): Import.
(GetM): Import.
(GetMM): Import.
(GetDepTarget): Import.
(GetMF): Import.
(GetMP): Import.
(GetObj): Import.
(GetMD): Import.
(GetMMD): Import.
(GenerateDefDependency): New procedure.
(GenerateDependenciesFromImport): New procedure.
(GenerateDependenciesFromList): New procedure.
(GenerateDependencies): New procedure.
(Compile): Re-write.
(compile): Re-format.
(CreateFileStem): New procedure function.
(DoPass0): Re-write.
(IsLibrary): New procedure function.
(IsUnique): New procedure function.
(Append): New procedure.
(MergeDep): New procedure.
(GetRuleTarget): New procedure function.
(ReadDepContents): New procedure function.
(WriteDep): New procedure.
(WritePhonyDep): New procedure.
(WriteDepContents): New procedure.
(CreateDepFilename): New procedure function.
(Pass0CheckDef): New procedure function.
(Pass0CheckMod): New procedure function.
(DoPass0): Re-write.
(DepContent): New variable.
(DepOutput): New variable.
(BaseName): New procedure function.
* gm2-compiler/M2GCCDeclare.mod (PrintTerse): Handle IsImport.
Replace IsGnuAsmVolatile with IsGnuAsm.
* gm2-compiler/M2Options.def (EXPORT QUALIFIED): Remove list.
(SetM): New procedure.
(GetM): New procedure function.
(SetMM): New procedure.
(GetMM): New procedure function.
(SetMF): New procedure.
(GetMF): New procedure function.
(SetPPOnly): New procedure.
(GetB): New procedure function.
(SetMD): New procedure.
(GetMD): New procedure function.
(SetMMD): New procedure.
(GetMMD): New procedure function.
(SetMQ): New procedure.
(SetMT): New procedure.
(GetMT): New procedure function.
(GetDepTarget): New procedure function.
(SetMP): New procedure.
(GetMP): New procedure function.
(SetObj): New procedure.
(SetSaveTempsDir): New procedure.
* gm2-compiler/M2Options.mod (SetM): New procedure.
(GetM): New procedure function.
(SetMM): New procedure.
(GetMM): New procedure function.
(SetMF): New procedure.
(GetMF): New procedure function.
(SetPPOnly): New procedure.
(GetB): New procedure function.
(SetMD): New procedure.
(GetMD): New procedure function.
(SetMMD): New procedure.
(GetMMD): New procedure function.
(SetMQ): New procedure.
(SetMT): New procedure.
(GetMT): New procedure function.
(GetDepTarget): New procedure function.
(SetMP): New procedure.
(GetMP): New procedure function.
(SetObj): New procedure.
(SetSaveTempsDir): New procedure.
* gm2-compiler/M2Preprocess.def (PreprocessModule): New parameters
topSource and outputDep.  Re-write.
(MakeSaveTempsFileNameExt): New procedure function.
(OnExitDelete): New procedure function.
* gm2-compiler/M2Preprocess.mod (GetM): Import.
(GetMM): Import.
(OnExitDelete): Add debugging message.
(RemoveFile): Add debugging message.
(BaseName): Remove.
(BuildCommandLineExecute): New procedure function.
* gm2-compiler/M2Search.def (SetDefExtension): Remove unnecessary
spacing.
* gm2-compiler/SymbolTable.mod (GetSymName): Handle ImportSym and
ImportStatementSym.
* gm2-gcc/m2options.h (M2Options_SetMD): New function.
(M2Options_GetMD): New function.
(M2Options_SetMMD): New function.
(M2Options_GetMMD): New function.
(M2Options_SetM): New function.
(M2Options_GetM): New function.
(M2Options_SetMM): New function.
(M2Options_GetMM): New function.
(M2Options_GetMQ): New function.
(M2Options_SetMF): New function.
(M2Options_GetMF): New function.
(M2Options_SetMT): New function.
(M2Options_SetMP): New function.
(M2Options_GetMP): New function.
(M2Options_GetDepTarget): New function.
* gm2-lang.cc (gm2_langhook_init): Correct comment case.
(gm2_langhook_init_options): Add case OPT_M and
OPT_MM.
(gm2_langhook_post_options): Add case OPT_MF, OPT_MT,
OPT_MD and OPT_MMD.
* lang-specs.h (M2CPP): Pass though MF option.
(MDMMD): New define.  Add MDMMD to "@modula-2".

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

tree-optimization/111846 - put simd-clone-info into SLP tree

The following avoids bogously re-using the simd-clone-info we
currently hang off stmt_info from two different SLP contexts where
a different number of lanes should have chosen a different best
simdclone.

PR tree-optimization/111846
* tree-vectorizer.h (_slp_tree::simd_clone_info): Add.
(SLP_TREE_SIMD_CLONE_INFO): New.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize
SLP_TREE_SIMD_CLONE_INFO.
(_slp_tree::~_slp_tree): Release it.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Use
SLP_TREE_SIMD_CLONE_INFO or STMT_VINFO_SIMD_CLONE_INFO
dependent on if we're doing SLP.

* gcc.dg/vect/pr111846.c: New testcase.

wide-int-print: Don't print large numbers hexadecimally for print_dec{,s,u}

The following patch implements printing of wide_int/widest_int numbers
decimally when asked for that using print_dec{,s,u}, even if they have
precision larger than 64 and get_len () above 1 (right now we printed
them hexadecimally and even negative numbers as huge positive hexadecimal).

In order to avoid the expensive division/modulo by 10^19 twice, once to
estimate how many will be needed and another to actually print it, the
patch prints the 19 digit chunks in reverse order (from least significant
to most significant) and then reorders those with linear complexity to form
the right printed number.
Tested with printing both 256 and 320 bit numbers (first as an example
of even number of 19 digit chunks plus one shorter above it, the second
as an example of odd number of 19 digit chunks plus one shorter above it).

The l * HOST_BITS_PER_WIDE_INT / 3 + 3 estimatition thinking about it now
is one byte too much (one byte for -, one for '\0') and too conservative,
so we could go with l * HOST_BITS_PER_WIDE_INT / 3 + 2 as well, or e.g.
l * HOST_BITS_PER_WIDE_INT * 10 / 33 + 3 as even less conservative
estimation (though more expensive to compute in inline code).
But that l * HOST_BITS_PER_WIDE_INT / 4 + 4; is likely one byte too much
as well, 2 bytes for 0x, one byte for '\0' and where does the 4th one come
from?  Of course all of these assuming HOST_BITS_PER_WIDE_INT is a multiple
of 64...

2023-10-17  Jakub Jelinek  <jakub@redhat.com>

* wide-int-print.h (print_dec_buf_size): For length, divide number
of bits by 3 and add 3 instead of division by 4 and adding 4.
* wide-int-print.cc (print_decs): Remove superfluous ()s.  Don't call
print_hex, instead call print_decu on either negated value after
printing - or on wi itself.
(print_decu): Don't call print_hex, instead print even large numbers
decimally.
(pp_wide_int_large): Assume len from print_dec_buf_size is big enough
even if it returns false.
* pretty-print.h (pp_wide_int): Use print_dec_buf_size to check if
pp_wide_int_large should be used.
* tree-pretty-print.cc (dump_generic_node): Use print_hex_buf_size
to compute needed buffer size.

RISC-V: Fix failed testcase when use -cmodel=medany

This little path fix a failed testcase when use -cmodel=medany.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/cpymem-1.c: Split check.

LibF7: Implement fma / fmal.

libgcc/config/avr/libf7/
* libf7.h (F7_SIZEOF): New macro.
* libf7-asm.sx: Use F7_SIZEOF instead of magic number "10".
(F7MOD_D_fma_, __fma): New module and function.
(fma) [-mdouble=64]: Define as alias for __fma.
(fmal) [-mlong-double=64]: Define as alias for __fma.
* libf7-common.mk (F7_ASM_PARTS): Add D_fma.

middle-end/111818 - failed DECL_NOT_GIMPLE_REG_P setting of volatile

The following addresses a missed DECL_NOT_GIMPLE_REG_P setting of
a volatile declared parameter which causes inlining to substitute
a constant parameter into a context where its address is required.

The main issue is in update_address_taken which clears
DECL_NOT_GIMPLE_REG_P from the parameter but fails to rewrite it
because is_gimple_reg returns false for volatiles. The following
changes maybe_optimize_var to make the 1:1 correspondence between
clearing DECL_NOT_GIMPLE_REG_P of a register typed decl and
actually rewriting it to SSA.

PR middle-end/111818
* tree-ssa.cc (maybe_optimize_var): When clearing
DECL_NOT_GIMPLE_REG_P always rewrite into SSA.

* gcc.dg/torture/pr111818.c: New testcase.

tree-optimization/111807 - ICE in verify_sra_access_forest

The following addresses build_reconstructed_reference failing to
build references with a different offset than the models and thus
the caller conditional being off. This manifests when attempting
to build a ref with offset 160 from the model BIT_FIELD_REF <l_4827[9], 8, 0>
onto the same base l_4827 but the models offset being 288. This
cannot work for any kind of ref I can think of, not just with
BIT_FIELD_REFs.

PR tree-optimization/111807
* tree-sra.cc (build_ref_for_model): Only call
build_reconstructed_reference when the offsets are the same.

* gcc.dg/torture/pr111807.c: New testcase.

expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee that 32-bit quantities are sign-extended into 64-bit registers,
meaning incoming SI function args need not be explicitly sign extended
(so do SI return values as most ALU insns implicitly sign-extend too.)

Existing REE doesn't seem to handle this well and there are various ideas
floating around to smarten REE about it.

RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
etc.

Another approach would be to prevent EXPAND from generating the
sign_extend in the first place which this patch tries to do.

The hunk being removed was introduced way back in 1994 as
   5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")

This survived full testsuite run for RISC-V rv64gc with surprisingly no
fallouts: test results before/after are exactly same.

|                               | # of unexpected case / # of unique unexpected case
|                               |          gcc |          g++ |     gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/|  264 /    87 |    5 /     2 |   72 /    12 |
|    lp64d/medlow

Granted for something so old to have survived, there must be a valid
reason. Unfortunately the original change didn't have additional
commentary or a test case. That is not to say it can't/won't possibly
break things on other arches/ABIs, hence the RFC for someone to scream
that this is just bonkers, don't do this 🙂

I've explicitly CC'ed Jakub and Roger who have last touched subreg
promoted notes in expr.cc for insight and/or screaming 😉

Thanks to Robin for narrowing this down in an amazing debugging session
@ GNU Cauldron.

```
foo2:
sext.w a6,a1             <-- this goes away
beq a1,zero,.L4
li a5,0
li a0,0
.L3:
addw a4,a2,a5
addw a5,a3,a5
addw a0,a4,a0
bltu a5,a6,.L3
ret
.L4:
li a0,0
ret
```

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Co-developed-by: Robin Dapp <rdapp.gcc@gmail.com>
PR target/111466
gcc/
* expr.cc (expand_expr_real_2): Do not clear SUBREG_PROMOTED_VAR_P.

gcc/testsuite
* gcc.target/riscv/pr111466.c: New test.

LoongArch: Fix vec_initv32qiv16qi template to avoid ICE.

Following test code triggers unrecognized insn ICE on LoongArch target
with "-O3 -mlasx":

void
foo (unsigned char *dst, unsigned char *src)
{
  for (int y = 0; y < 16; y++)
    {
      for (int x = 0; x < 16; x++)
        dst[x] = src[x] + 1;
      dst += 32;
      src += 32;
    }
}

ICE info:
./test.c: In function ‘foo’:
./test.c:8:1: error: unrecognizable insn:
    8 | }
      | ^
(insn 15 14 16 4 (set (reg:V32QI 185 [ vect__24.7 ])
        (vec_concat:V32QI (reg:V16QI 186)
            (const_vector:V16QI [
                    (const_int 0 [0]) repeated x16
                ]))) "./test.c":4:19 -1
     (nil))
during RTL pass: vregs
./test.c:8:1: internal compiler error: in extract_insn, at recog.cc:2791
0x12028023b _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
        /home/panchenghui/upstream/gcc/gcc/rtl-error.cc:108
0x12028026f _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
        /home/panchenghui/upstream/gcc/gcc/rtl-error.cc:116
0x120a03c5b extract_insn(rtx_insn*)
        /home/panchenghui/upstream/gcc/gcc/recog.cc:2791
0x12067ff73 instantiate_virtual_regs_in_insn
        /home/panchenghui/upstream/gcc/gcc/function.cc:1610
0x12067ff73 instantiate_virtual_regs
        /home/panchenghui/upstream/gcc/gcc/function.cc:1983
0x12067ff73 execute
        /home/panchenghui/upstream/gcc/gcc/function.cc:2030

This RTL is generated inside loongarch_expand_vector_group_init function (related
to vec_initv32qiv16qi template). Original impl doesn't ensure all vec_concat arguments
are register type. This patch adds force_reg() to the vec_concat argument generation.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_expand_vector_group_init):
fix impl related to vec_initv32qiv16qi template to avoid ICE.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-vec-init-1.c: New test.

LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP.

There are two reasons for removing this macro definition:
1. The default in the assembler is to use the nop instruction for filling.
2. For assembly directives: .align [abs-expr[, abs-expr[, abs-expr]]]
   The third expression it is the maximum number of bytes that should be
   skipped by this alignment directive.
   Therefore, it will affect the display of the specified alignment rules
   and affect the operating efficiency.

This modification relies on binutils commit 1fb3cdd87ec61715a5684925fb6d6a6cf53bb97c.
(Since the assembler will add nop based on the .align information when doing relax,
it will cause the conditional branch to go out of bounds during the assembly process.
This submission of binutils solves this problem.)

gcc/ChangeLog:

* config/loongarch/loongarch.h (ASM_OUTPUT_ALIGN_WITH_NOP):
Delete.

Co-authored-by: Chenghua Xu <xuchenghua@loongson.cn>

RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

Consider this following case:
int
bar (int *x, int a, int b, int n)
{
  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
  int sum1 = 0;
  int sum2 = 0;
  for (int i = 0; i < n; ++i)
    {
      sum1 += x[2*i] - a;
      sum1 += x[2*i+1] * b;
      sum2 += x[2*i] - b;
      sum2 += x[2*i+1] * a;
    }
  return sum1 + sum2;
}

Before this patch:

bar:
        ble     a3,zero,.L5
        csrr    t0,vlenb
        csrr    a6,vlenb
        slli    t1,t0,3
        vsetvli a5,zero,e32,m4,ta,ma
        sub     sp,sp,t1
        vid.v   v20
        vmv.v.x v12,a1
        vand.vi v4,v20,1
        vmv.v.x v16,a2
        vmseq.vi        v4,v4,1
        slli    t3,a6,2
        vsetvli zero,a5,e32,m4,ta,ma
        vmv1r.v v0,v4
        viota.m v8,v4
        add     a7,t3,sp
        vsetvli a5,zero,e32,m4,ta,mu
        vand.vi v28,v20,-2
        vadd.vi v4,v28,1
        vs4r.v  v20,0(a7)                        -----  spill
        vrgather.vv     v24,v12,v8
        vrgather.vv     v20,v16,v8
        vrgather.vv     v24,v16,v8,v0.t
        vrgather.vv     v20,v12,v8,v0.t
        vs4r.v  v4,0(sp)                          ----- spill
        slli    a3,a3,1
        addi    t4,a6,-1
        neg     t1,a6
        vmv4r.v v0,v20
        vmv.v.i v4,0
        j       .L4
.L13:
        vsetvli a5,zero,e32,m4,ta,ma
.L4:
        mv      a7,a3
        mv      a4,a3
        bleu    a3,a6,.L3
        csrr    a4,vlenb
.L3:
        vmv.v.x v8,t4
        vl4re32.v       v12,0(sp)                ---- spill
        vand.vv v20,v28,v8
        vand.vv v8,v12,v8
        vsetvli zero,a4,e32,m4,ta,ma
        vle32.v v16,0(a0)
        vsetvli a5,zero,e32,m4,ta,ma
        add     a3,a3,t1
        vrgather.vv     v12,v16,v20
        add     a0,a0,t3
        vrgather.vv     v20,v16,v8
        vsub.vv v12,v12,v0
        vsetvli zero,a4,e32,m4,tu,ma
        vadd.vv v4,v4,v12
        vmacc.vv        v4,v24,v20
        bgtu    a7,a6,.L13
        csrr    a1,vlenb
        slli    a1,a1,2
        add     a1,a1,sp
        li      a4,-1
        csrr    t0,vlenb
        vsetvli a5,zero,e32,m4,ta,ma
        vl4re32.v       v12,0(a1)               ---- spill
        vmv.v.i v8,0
        vmul.vx v0,v12,a4
        li      a2,0
        slli    t1,t0,3
        vadd.vi v0,v0,-1
        vand.vi v0,v0,1
        vmseq.vv        v0,v0,v8
        vand.vi v12,v12,1
        vmerge.vvm      v16,v8,v4,v0
        vmseq.vv        v12,v12,v8
        vmv.s.x v1,a2
        vmv1r.v v0,v12
        vredsum.vs      v16,v16,v1
        vmerge.vvm      v8,v8,v4,v0
        vmv.x.s a0,v16
        vredsum.vs      v8,v8,v1
        vmv.x.s a5,v8
        add     sp,sp,t1
        addw    a0,a0,a5
        jr      ra
.L5:
        li      a0,0
        ret

We can there are multiple horrible register spillings.
The root cause of this issue is for a scalar IR load:

_5 = *_4;

We didn't check whether it is a continguous load/store or gather/scatter load/store

Since it will be translate into:

   1. MASK_LEN_GATHER_LOAD (..., perm indice).
   2. Continguous load/store + VEC_PERM (..., perm indice)

It's obvious that no matter which situation, we will end up with consuming one vector register group (perm indice)
that we didn't count it before.

So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost model.

The key of this patch is:

  if ((type == load_vec_info_type || type == store_vec_info_type)
      && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)))
    {
               ...
            }

Add one more register consumption if it is not an adjacent load/store.

After this patch, it pick LMUL = 2 which is optimal:

bar:
ble a3,zero,.L4
csrr a6,vlenb
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v6,a2
srli a2,a6,1
vmv.v.x v4,a1
vid.v v12
slli a3,a3,1
vand.vi v0,v12,1
addi t1,a2,-1
vmseq.vi v0,v0,1
slli a6,a6,1
vsetvli zero,a5,e32,m2,ta,ma
neg a7,a2
viota.m v2,v0
vsetvli a5,zero,e32,m2,ta,mu
vrgather.vv v16,v4,v2
vrgather.vv v14,v6,v2
vrgather.vv v16,v6,v2,v0.t
vrgather.vv v14,v4,v2,v0.t
vand.vi v18,v12,-2
vmv.v.i v2,0
vadd.vi v20,v18,1
.L3:
minu a4,a3,a2
vsetvli zero,a4,e32,m2,ta,ma
vle32.v v8,0(a0)
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v4,t1
vand.vv v10,v18,v4
vrgather.vv v6,v8,v10
vsub.vv v6,v6,v14
vsetvli zero,a4,e32,m2,tu,ma
vadd.vv v2,v2,v6
vsetvli a1,zero,e32,m2,ta,ma
vand.vv v4,v20,v4
vrgather.vv v6,v8,v4
vsetvli zero,a4,e32,m2,tu,ma
mv a4,a3
add a0,a0,a6
add a3,a3,a7
vmacc.vv v2,v16,v6
bgtu a4,a2,.L3
vsetvli a1,zero,e32,m2,ta,ma
vand.vi v0,v12,1
vmv.v.i v4,0
li a3,-1
vmseq.vv v0,v0,v4
vmv.s.x v1,zero
vmerge.vvm v6,v4,v2,v0
vredsum.vs v6,v6,v1
vmul.vx v0,v12,a3
vadd.vi v0,v0,-1
vand.vi v0,v0,1
vmv.x.s a4,v6
vmseq.vv v0,v0,v4
vmv.s.x v1,zero
vmerge.vvm v4,v4,v2,v0
vredsum.vs v4,v4,v1
vmv.x.s a0,v4
addw a0,a0,a4
ret
.L4:
li a0,0
ret

No spillings.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Fix big LMUL issue.
(get_store_value): New function.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: New test.

d: Forbid taking the address of an intrinsic with no implementation

This code fails to link:

    import core.math;
    real function(real) fn = &sin;

However, when called directly, the D intrinsic `sin()' is expanded by
the front-end into the GCC built-in `__builtin_sin()'.  This has been
fixed to now also expand the function when a reference is taken.

As there are D intrinsics and GCC built-ins that don't have a fallback
implementation, raise an error if taking the address is not possible.

gcc/d/ChangeLog:

* d-tree.h (intrinsic_code): Update define for DEF_D_INTRINSIC.
(maybe_reject_intrinsic): New prototype.
* expr.cc (ExprVisitor::visit (SymOffExp *)): Call
maybe_reject_intrinsic.
* intrinsics.cc (intrinsic_decl): Add fallback field.
(intrinsic_decls): Update define for DEF_D_INTRINSIC.
(maybe_reject_intrinsic): New function.
* intrinsics.def (DEF_D_LIB_BUILTIN): Update.
(DEF_CTFE_BUILTIN): Update.
(INTRINSIC_BSF): Declare as library builtin.
(INTRINSIC_BSR): Likewise.
(INTRINSIC_BT): Likewise.
(INTRINSIC_BSF64): Likewise.
(INTRINSIC_BSR64): Likewise.
(INTRINSIC_BT64): Likewise.
(INTRINSIC_POPCNT32): Likewise.
(INTRINSIC_POPCNT64): Likewise.
(INTRINSIC_ROL): Likewise.
(INTRINSIC_ROL_TIARG): Likewise.
(INTRINSIC_ROR): Likewise.
(INTRINSIC_ROR_TIARG): Likewise.
(INTRINSIC_ADDS): Likewise.
(INTRINSIC_ADDSL): Likewise.
(INTRINSIC_ADDU): Likewise.
(INTRINSIC_ADDUL): Likewise.
(INTRINSIC_SUBS): Likewise.
(INTRINSIC_SUBSL): Likewise.
(INTRINSIC_SUBU): Likewise.
(INTRINSIC_SUBUL): Likewise.
(INTRINSIC_MULS): Likewise.
(INTRINSIC_MULSL): Likewise.
(INTRINSIC_MULU): Likewise.
(INTRINSIC_MULUI): Likewise.
(INTRINSIC_MULUL): Likewise.
(INTRINSIC_NEGS): Likewise.
(INTRINSIC_NEGSL): Likewise.
(INTRINSIC_TOPRECF): Likewise.
(INTRINSIC_TOPREC): Likewise.
(INTRINSIC_TOPRECL): Likewise.

gcc/testsuite/ChangeLog:

* gdc.dg/builtins_reject.d: New test.
* gdc.dg/intrinsics_reject.d: New test.

Daily bump.

Fix minor problem in stack probing

probe_stack_range has an assert to capture the possibility that that
expand_binop might not construct its result in the provided target.

We triggered that internally a little while ago. I'm pretty sure it was in the
testsuite, so no new testcase. The fix is easy, copy the result into the
proper target when needed.

Bootstrapped and regression tested on x86.

gcc/
* explow.cc (probe_stack_range): Handle case when expand_binop
does not construct its result in the expected location.

diagnostics: special-case -fdiagnostics-text-art-charset=ascii for LANG=C

In the LWN discussion of the "ASCII" art in GCC 14
https://lwn.net/Articles/946733/#Comments
there was some concern about the use of non-ASCII characters in the
output.

Currently -fdiagnostics-text-art-charset defaults to "emoji".
To better handle older terminals by default, this patch special-cases
LANG=C to use -fdiagnostics-text-art-charset=ascii.

gcc/ChangeLog:
* diagnostic.cc (diagnostic_initialize): When LANG=C, update
default for -fdiagnostics-text-art-charset from emoji to ascii.
* doc/invoke.texi (fdiagnostics-text-art-charset): Document the above.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: fix missing initialization of context->extra_output_kind

gcc/ChangeLog:
* diagnostic.cc (diagnostic_initialize): Ensure
context->extra_output_kind is initialized.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

i386: Allow -mlarge-data-threshold with -mcmodel=large

From: Fangrui Song <maskray@google.com>

When using -mcmodel=medium, large data objects larger than the
-mlarge-data-threshold threshold are placed into large data sections
(.lrodata, .ldata, .lbss and some variants).  GNU ld and ld.lld 17 place
.l* sections into separate output sections.  If small and medium code
model object files are mixed, the .l* sections won't exert relocation
overflow pressure on sections in object files built with -mcmodel=small.

However, when using -mcmodel=large, -mlarge-data-threshold doesn't
apply.  This means that the .rodata/.data/.bss sections may exert
relocation overflow pressure on sections in -mcmodel=small object files.

This patch allows -mcmodel=large to generate .l* sections and drops an
unneeded documentation restriction that the value must be the same.

Link: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
("Large data sections for the large code model")

Signed-off-by: Fangrui Song <maskray@google.com>
gcc/ChangeLog:

* config/i386/i386.cc (ix86_can_inline_p):
Handle CM_LARGE and CM_LARGE_PIC.
(x86_elf_aligned_decl_common): Ditto.
(x86_output_aligned_bss): Ditto.
* config/i386/i386.opt: Update doc for -mlarge-data-threshold=.
* doc/invoke.texi: Update doc for -mlarge-data-threshold=.

gcc/testsuite/ChangeLog:

* gcc.target/i386/large-data.c: New test.

RISC-V: NFC: Move scalar block move expansion code into riscv-string.cc

This just moves a few functions out of riscv.cc into riscv-string.cc in an
attempt to keep riscv.cc manageable.  This was originally Christoph's code and
I'm just pushing it on his behalf.

Full disclosure: I built rv64gc after changing to verify everything still
builds.  Given it was just lifting code from one place to another, I didn't run
the testsuite.

gcc/
* config/riscv/riscv-protos.h (emit_block_move): Remove redundant
prototype.  Improve comment.
* config/riscv/riscv.cc (riscv_block_move_straight): Move from riscv.cc
into riscv-string.cc.
(riscv_adjust_block_mem, riscv_block_move_loop): Likewise.
(riscv_expand_block_move): Likewise.
* config/riscv/riscv-string.cc (riscv_block_move_straight): Add moved
function.
(riscv_adjust_block_mem, riscv_block_move_loop): Likewise.
(riscv_expand_block_move): Likewise.

RISC-V/testsuite: add a default march (lacking zfa) to some fp tests

A bunch of FP tests expecting specific FP asm output fail when built
with zfa because different insns are generated. And this happens
because those tests don't have an explicit -march and the default
used to configure gcc could end up with zfa causing the false fails.

Fix that by adding the -march explicitly which doesn't have zfa.

BTW it seems we have some duplication in tests for zfa and non-zfa and
it would have been better if they were consolidated, but oh well.

gcc/testsuite:
* gcc.target/riscv/fle-ieee.c: Updates dg-options with
explicit -march=rv64gc and -march=rv32gc.
* gcc.target/riscv/fle-snan.c: Ditto.
* gcc.target/riscv/fle.c: Ditto.
* gcc.target/riscv/flef-ieee.c: Ditto.
* gcc.target/riscv/flef.c: Ditto.
* gcc.target/riscv/flef-snan.c: Ditto.
* gcc.target/riscv/flt-ieee.c: Ditto.
* gcc.target/riscv/flt-snan.c: Ditto.
* gcc.target/riscv/fltf-ieee.c: Ditto.
* gcc.target/riscv/fltf-snan.c: Ditto.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

Implement new RTL optimizations pass: fold-mem-offsets

This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:

  addi t4,sp,16
  add  t2,a6,t4
  shl  t3,t2,1
  ld   a2,0(t3)
  addi a2,1
  sd   a2,8(t2)

into the following (one instruction less):

  add  t2,a6,sp
  shl  t3,t2,1
  ld   a2,32(t3)
  addi a2,1
  sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

gcc/ChangeLog:

* Makefile.in: Add fold-mem-offsets.o.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_fold_mem_offsets): Declare.
* common.opt: New options.
* doc/invoke.texi: Document new option.
* fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.
* gcc.target/i386/pr52146.c: Adjust expected output.

Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>

d: Merge upstream dmd, druntime 4c18eed967, phobos d945686a4.

D front-end changes:

- Import latest fixes to mainline.

D runtime changes:

- Import latest fixes to mainline.

Phobos changes:

- Import latest fixes to mainline.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 4c18eed967.
* d-diagnostic.cc (verrorReport): Update for new front-end interface.
(verrorReportSupplemental): Likewise.
* d-lang.cc (d_init_options): Likewise.
(d_handle_option): Likewise.
(d_post_options): Likewise.
(d_parse_file): Likewise.
* decl.cc (get_symbol_decl): Likewise.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 4c18eed967.
* src/MERGE: Merge upstream phobos d945686a4.

MATCH: Improve `A CMP 0 ? A : -A` set of patterns to use bitwise_equal_p.

This improves the `A CMP 0 ? A : -A` set of match patterns to use
bitwise_equal_p which allows an nop cast between signed and unsigned.
This allows catching a few extra cases which were not being caught before.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/101541
* match.pd (A CMP 0 ? A : -A): Improve
using bitwise_equal_p.

gcc/testsuite/ChangeLog:

PR tree-optimization/101541
* gcc.dg/tree-ssa/phi-opt-36.c: New test.
* gcc.dg/tree-ssa/phi-opt-37.c: New test.

[PR31531] MATCH: Improve ~a < ~b and ~a < CST, allow a nop cast inbetween ~ and a/b

Currently we able to simplify `~a CMP ~b` to `b CMP a` but we should allow a nop
conversion in between the `~` and the `a` which can show up. A similarly thing should
be done for `~a CMP CST`.

I had originally submitted the `~a CMP CST` case as
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585088.html;
I noticed we should do the same thing for the `~a CMP ~b` case and combined
it with that one here.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/31531

gcc/ChangeLog:

* match.pd (~X op ~Y): Allow for an optional nop convert.
(~X op C): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr31531-1.c: New test.
* gcc.dg/tree-ssa/pr31531-2.c: New test.

c++: improve fold-expr location

I want to distinguish between constraint && and fold-expressions there of
written by the user and those implied by template parameter
type-constraints; to that end, let's improve our EXPR_LOCATION for an
explicit fold-expression.

The fold3.C change is needed because this moves the caret from the end of
the expression to the operator, which means the location of the error refers
to the macro invocation rather than the macro definition; both locations are
still printed, but which one is an error and which a note changes.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_fold_expression): Track location range.
* semantics.cc (finish_unary_fold_expr)
(finish_left_unary_fold_expr, finish_right_unary_fold_expr)
(finish_binary_fold_expr): Add location parm.
* constraint.cc (finish_shorthand_constraint): Pass it.
* pt.cc (convert_generic_types_to_packs): Likewise.
* cp-tree.h: Adjust.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/diagnostic3.C: Add expected column.
* g++.dg/cpp1z/fold3.C: Adjust diagnostic lines.

c++: fix truncated diagnostic in C++23 [PR111272]

In C++23, since P2448, a constexpr function F that calls a non-constexpr
function N is OK as long as we don't actually call F in a constexpr
context.  So instead of giving an error in maybe_save_constexpr_fundef,
we only give an error when evaluating the call.  Unfortunately, as shown
in this PR, the diagnostic can be truncated:

z.C:10:13: note: 'constexpr Jam::Jam()' is not usable as a 'constexpr' function because:
   10 |   constexpr Jam() { ft(); }
      |             ^~~

...because what?  With this patch, we say:

z.C:10:13: note: 'constexpr Jam::Jam()' is not usable as a 'constexpr' function because:
   10 |   constexpr Jam() { ft(); }
      |             ^~~
z.C:10:23: error: call to non-'constexpr' function 'int Jam::ft()'
   10 |   constexpr Jam() { ft(); }
      |                     ~~^~
z.C:8:7: note: 'int Jam::ft()' declared here
    8 |   int ft() { return 42; }
      |       ^~

Like maybe_save_constexpr_fundef, explain_invalid_constexpr_fn should
also check the body of a constructor, not just the mem-initializer.

PR c++/111272

gcc/cp/ChangeLog:

* constexpr.cc (explain_invalid_constexpr_fn): Also check the body of
a constructor in C++14 and up.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-diag1.C: New test.